MIDS Capstone Project Summer 2018

Crime & Hazard: A Housing Project

A website to help homebuyers and renters in San Francisco discover a property’s crime and hazard profiles.  Users are initially provided with Crime Safety and Hazard Safety percentile scores that present a quick overview of the address compared to the entire city.  Users can choose to continue to search for other properties or dive deeper into the statistics on the other sections of the website.

From the crime map page, users can explore a heat map of the area’s crime and adjust the presentation of the information by month and/or top crimes.  The crime statistics page presents a crime density comparison by hyper-local (one thousand foot radius), neighborhood, and citywide areas.  A logistic-regression classification model predicts whether the crime is going up or down along with a probability score.  Finally a chart showing the crime counts by solved and unsolved crimes for the past fifteen years is presented.

The hazards page is primarily focused on providing average earthquake intensities for the address with a comparison for the neighborhood and city.  Users can dive further into the earthquake models that underlie the average intensity score and discover the how far they are from all the major faults and the potential damage each fault can cause.  Other hazards such as fire, flood, landslide, and sea-level rise are also provided.

 

The Home page (searching for the Painted Ladies of Alamo Square, aka Postcard row)

 

The Crime Map page

 

The Crime Statistics page

 

The More section on the Crime Statistics page

 

The Hazards page

 

The More section on the Hazards page

 

Crime & Hazard

A Housing Quality of Life Project by Chris Beecroft, Lana Riva, Utthaman Thirunavukkarasu, Alex Yang
UC Berkeley MIDS Program, W210 Capstone Project, Professors David Steier and Stan Kelman, Summer 2018

History

This project was originally conceived as a means for homebuyers and renters to find safe housing in low crime areas with low hazard risks. In addition, existing homeowners could discover what the crime and hazard profiles are of their current homes to be informed of existing risks to help address and prepare for eventual issues. While other websites provide most of this information, they usually do not provide comparative statistics, nor information on the ‘hyper-local’ area (area within a 1,000’ of the address).

While hazard data is disclosed during escrow, having this information earlier in the search will help home-buyers in finding safer properties. In addition renters are usually not privy to these disclosures, websites such as this provide their only means of understanding the crime and hazard character of their new neighborhood.

Future plans are to develop a house price model to find areas where the home prices do not reflect the crime trends and hazard risks.

Data Sources

San Francisco Crime Data https://data.sfgov.org/Public-Safety/-Change-Notice-Police-Department-In...
 
Bay Area Earthquake, flooding, wildfire, landslides, sea level rise GIS data http://resilience.abag.ca.gov/open-data/
http://www.fire.ca.gov/fire_prevention/fire_prevention_wildland_zones_maps
 
US Earthquake Fault GIS Data https://earthquake.usgs.gov/learn/kml.php
 
California Tsunami GIS Data http://www.conservation.ca.gov/cgs/geologic_hazards/Tsunami/Inundation_Maps
 
Census Tract GIS Data https://data.sfgov.org/Geographic-Locations-and-Boundaries/Analysis-Neig...
 
SF Realtor Neighborhood GIS Data https://data.sfgov.org/Geographic-Locations-and-Boundaries/Realtor-Neigh...
 
2016 Census Estimates https://censusreporter.org

Models

Safety Scores

The distribution of the safety scores was generated by examining the crime and hazard models across the city. Since the complete list of addresses within the city was not available, we used an evenly spaced mesh network placed over the city as an approximation. Areas in census tracts with low population (Golden Gate Park, Hunter’s Point Shipyards, etc.) along with areas with low population density (e.g., portions of The Presidio, Lincoln Park, Lake Merced neighborhoods, etc.) have been removed from consideration.

The crime score consists of the number of crimes within a 1,000’ foot radius of the measured point along with a penalty for unresolved assault crimes.

The hazard score consists of the average earthquake intensity along with the liquefaction measure. This score also includes a penalty for each of the other hazards listed.

Crime Map

The crime map represents crime statistics on a square one-mile area around the selected address. Since the original San Francisco crime data is at the block and intersection level, any individual spots placed on the map should not be construed as a crime occurring at a particular house or building. For purposes of creating an even crime density on the map, data points have been randomly spread around within a hundred feet of the center of the block or intersection.

Crime Statistics

Crime prediction is a logistic regression model that predicts the classification of crime going up or down. The model uses crime data from 2003 to 2017 limited to incidents from a radius of 1,000’ around the selected address.

Crime comparison density uses all crimes from January 1st, 2017 to May 11th, 2018. While the local area (1,000’ radius) is approximately a square third-mile, all numbers have been scaled down to a square quarter-mile (1,320’ by 1,320’).

Crime counts uses all crimes within a 1,000’ radius of the address. Total crimes are represented by the bars and are divided into solved (green) and unsolved (blue) crimes. A trend line (orange) is also included.

Hazards

All hazard data has come from the state and has a low level of granularity. Small features such as outcroppings of bedrock in a fill area, small flood areas, and so on may not show up in the source data. Therefore the numbers shown here should be taken as a best guess approximation.

Earthquake and Liquefaction: The state has generated several models for each fault depending on different magnitudes and epicenters. For the averages presented here, the largest modeled event from each fault is used in the average score.

Course

Data Science W210. Capstone, Summer 2018 (Lecture) (Section 2)

More Information

Last updated:

August 13, 2018