= 'https://mdporter.github.io/SYS6018/data/' # data directory
data_dir library(ks) # functions for KDE
library(tidyverse) # functions for data manipulation
SYS 6018 | Spring 2024 | University of Virginia
Homework #9: Density Estimation
Required R packages and Directories
Problem 1 Geographic Profiling
Geographic profiling, a method developed in criminology, can be used to estimate the home location (roost) of animals based on a collection of sightings. The approach requires an estimate of the distribution the animal will travel from their roost to forage for food.
A sample of \(283\) distances that pipistrelle bats traveled (in meters) from their roost can be found at:
One probability model for the distance these bats will travel is: \[\begin{align*} f(x; \theta) = \frac{x}{\theta} \exp \left( - \frac{x^2}{2 \theta} \right) \end{align*}\] where the parameter \(\theta > 0\) controls how far they are willing to travel.
a. Derive a closed-form expression for the MLE for \(\theta\) (i.e., show the math).
b. Estimate \(\theta\) for the bat data using MLE?
Calculate using the solution to part a, or use computational methods.
c. Plot the estimated density
Using the MLE value of \(\theta\) from part b, calculate the estimated density at a set of evaluation points between 0 and 8 meters. Plot the estimated density.
- The x-axis should be distance and y-axis should be density (pdf).
d. Estimate the density using KDE.
Report the bandwidth you selected and produce a plot of the estimated density.
e. Which model do you prefer, the parametric or KDE?
Problem 2: Interstate Crash Density
Interstate 64 (I-64) is a major east-west road that passes just south of Charlottesville. Where and when are the most dangerous places/times to be on I-64? The crash data (link below) gives the mile marker and fractional time-of-week for crashes that occurred on I-64 between mile marker 87 and 136 in 2016. The time-of-week data takes a numeric value of <dow>.<hour/24>, where the dow starts at 0 for Sunday (6 for Sat) and the decimal gives the time of day information. Thus time=0.0417
corresponds to Sun at 1am and time=6.5
corresponds to Sat at noon.
a. Crash Data
Extract the crashes and make a scatter plot with mile marker on x-axis and time on y-axis.
b. Use KDE to estimate the mile marker density.
- Report the bandwidth.
- Plot the density estimate.
c. Use KDE to estimate the temporal time-of-week density.
- Report the bandwidth.
- Plot the density estimate.
d. Use KDE to estimate the bivariate mile-time density.
- Report the bandwidth parameters.
- Plot the bivariate density estimate.
e. Crash Hotspot
Based on the estimated density, approximate the most dangerous place and time to drive on this stretch of road. Identify the mile marker and time-of-week pair (within a few miles and hours).