Boston Crime
Analysis and mapping of crime in the Boston area
Skills
- RStudio
- ggplot
- ggmap
- lubridate
- dplyr
Crime Statistics Description
The dataset for 2016-17 has 91561 variables. I’m using a February to February year cycle.
Likewise, the dataset 2017-18 has 92041 variables.
Packages Required
plyr, ggmap, ggplot2, lubridate ; all downloadable from CRAN
Prerequisites
R and R Studio
Dataset Description
Last Update of Dataset 02/04/2018
Data Provided under Open Data Commons Public Domain Dedication and License (PDDL).
Simple complete case matching is done to remove any empty variables. Data is split into two chunks 2016-17 and 2017-18.
Crime Density Map
First I went about with the geospatial visualisation. I utilised the ggmap library to pull a roadmap from Google Maps. This was later overlaid with multiple grammar of graphics layers for density gradient and more. Here’s a code snippet.
interestedregion <- ggmap(bostonmap)+
#Create the density plots for crime regions. Bin size is responsible for resolution
stat_density2d(data = lastyear, aes(x=lastyear$Long,y=lastyear$Lat,alpha=..level..,fill=..level..), bins=15,geom='polygon')+
#Crime Density scale in range of dark magenta to orange to denote intensity. Use any color combination that makes sense
scale_fill_gradient('Crime\nDensity', low = 'darkmagenta', high = 'gold') +
#Populate map with the density color scheme
scale_alpha(range = c(.2, .3), guide = FALSE)+
guides(fill = guide_colorbar(barwidth = 1.5, barheight = 10)) +
#Remove the axes titles long and lat
theme(axis.title.y = element_blank(), axis.title.x = element_blank()) +
ggtitle("Boston Crime February 2017 - February 2018")
This map shows the crime density in Boston for a period of one year from February 2017 to February 2018.
Crime Statistics Districtwise
Next, I tackled simple statistics on the crime data and utilised barplot modeling on the frequency of crimes committed districtwise. A minimalisitic theme is applied to make the barplots look nice. Here’s the function I wrote to extract barplots.
plotter <- function(dataset, title){
#Plot the Barplot for the table created.
plotout <- ggplot(dataset, aes(x=District, y=dataset[2],fill=District)) +
geom_bar(stat = "identity") +
#applies a minimalistic theme
theme_minimal() +
ggtitle(title)
return(plotout)
}
First we generate a plot for the 16-17 year
Then we do the same for the 17-18 year.
Now we can compare the percentage change between the two years and find out how much incidents of crime have changed
Future
Will be working on a neural network to predict hotspots for the upcoming months. Will be interesting to see if that actually happens IRL. Leave your comments or feel free to contact me with suggestions.
Codebase and License
Here’s the full github repo for this project. This project is licensed under the MIT License - see the LICENSE.md file located in my github repo for more details.
Acknowledgments
-
R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
-
R Packages used : Plyr, ggmap, ggplot2, lubridate