DS 6030 | Fall 2024 | University of Virginia

Homework #10: Density Estimation

Author

Your Name Here

Published

November 19, 2024

Required R packages and Directories

data_dir = 'https://mdporter.github.io/teaching/data' # data directory
library(ks)        # functions for KDE
library(tidyverse) # functions for data manipulation

Problem 1 Geographic Profiling

Geographic profiling, a method developed in criminology, can be used to estimate the home location (roost) of animals based on a collection of sightings. The approach requires an estimate of the distribution the animal will travel from their roost to forage for food.

A sample of \(283\) distances that pipistrelle bats traveled (in meters) from their roost can be found at:

Bat Data: https://mdporter.github.io/teaching/data/geo_profile.csv

One probability model for the distance these bats will travel is: \[\begin{align*} f(x; \theta) = \frac{x}{\theta} \exp \left( - \frac{x^2}{2 \theta} \right) \end{align*}\] where the parameter \(\theta > 0\) controls how far they are willing to travel.

a. Derive a closed-form expression for the MLE for \(\theta\) (i.e., show the math).

Solution

Add solution here.

b. Estimate \(\theta\) for the bat data using MLE?

Calculate using the solution to part a, or use computational methods.

Solution

Add solution here.

c. Plot the estimated density

Using the MLE value of \(\theta\) from part b, calculate the estimated density at a set of evaluation points between 0 and 8 meters. Plot the estimated density.

The x-axis should be distance and y-axis should be density (pdf).

Solution

Add solution here.

d. Estimate the density using KDE.

Report the bandwidth you selected and produce a plot of the estimated density.

Solution

Add solution here.

e. Which model do you prefer, the parametric or KDE? Why?

Solution

Add solution here.

Problem 2: Interstate Crash Density

Interstate 64 (I-64) is a major east-west road that passes just south of Charlottesville. Where and when are the most dangerous places/times to be on I-64? The crash data (link below) gives the mile marker and fractional time-of-week for crashes that occurred on I-64 between mile marker 87 and 136 in 2016. The time-of-week data takes a numeric value of <dow>.<hour/24>, where the dow starts at 0 for Sunday (6 for Sat) and the decimal gives the time of day information. Thus time=0.0417 corresponds to Sun at 1am and time=6.5 corresponds to Sat at noon.

Crash Data: https://mdporter.github.io/teaching/data/crashes16.csv

a. Crash Data

Extract the crashes and make a scatter plot with mile marker on x-axis and time on y-axis.

Solution

Add solution here.

b. Use KDE to estimate the mile marker density.

Report the bandwidth.
Plot the density estimate.

Solution

Add solution here.

c. Use KDE to estimate the temporal time-of-week density.

Report the bandwidth.
Plot the density estimate.

Solution

Add solution here.

d. Use KDE to estimate the bivariate mile-time density.

Report the bandwidth parameters.
Plot the bivariate density estimate.

Solution

Add solution here.

e. Crash Hotspot

Based on the estimated density, approximate the most dangerous place and time to drive on this stretch of road. Identify the mile marker and time-of-week pair (within a few miles and hours).

Solution

Add solution here.