MIDS Capstone Project Spring 2023

Hate Crime Risk Index

Team members

According to an article published by Wall Street Journal, FBI Data showed that in the year 2021 hate crimes rose by 11.6% when compared to earlier incomplete numbers, which had indicated a drop. From this report, we learnt two things:

Hate Crimes are prevalent, rising and are an imminent problem in our society.
Purely from a data perspective, as far as our research goes we still do not have tools and mechanisms to monitor hate crimes and understand what is causing them.

This motivated us to take up this challenge and develop a data science tool for social good to help take one-step forward in understanding what is leading to higher hate crime risk, such that one day they can be mitigated effectively.

To begin addressing this problem, we start leveraging various datasets like Census, Elections, Google Trends and FBI Hate Crime Dataset to train our model.

We used the state-of-the-art model GPBoost, which combines tree-boosting techniques with grouped random effects models. GPBoost does not assume conditional independence across samples, which aligns with our data format – the same U.S. counties being observed year after year. We used Shapley Additive Explanations (SHAP) to extract important features from our final model. This produced the most important factors influencing hate crime risk per county per year. Users can use these values to inform their decisions of where to send resources and what type of resources will benefit each area. This work has been packaged into an interactive dashboard to provide an instant hate crime risk index for each county across the US.

Solving social science problems using Data Science is difficult. The audience is non-technical, and there are diverse groups of organizations that are targeting different issues and outcomes. This tool overcomes many technical challenges. Getting a reliable public dataset to the required granularity is hard to come by. Data must be built, rather than found. Also, in the Social Science space, training a highly accurate model is solving only one part of the puzzle – stakeholders care more about being able to identify the factors contributing to the target output.

There is no single solution that will solve a major problem like hate crimes. Having said that, our solution is one step forward in understanding factors leading to higher hate crimes against the LGBTQ+ Community, which should help in mitigating them and make this a better world to live in.

Check us out at HateCrimeRiskIndex

Course

Data Science 210. Capstone , Spring 2023

Class Project Gallery

More Information

Project website

Last updated: April 24, 2023