MIDS Capstone Project Fall 2017


In many places throughout the United States, voting districts at levels from municipal government up through congressional districts are drawn by the individuals and parties in power at the time when districts are scheduled to be redrawn. Because of the personal stake that these decision makers have in the outcome of the redistricting process, it is unrealistic to expect these individuals to draw fair districts. Often times, these individuals select districts that maximize their own chances for reelection and cannot be considered fair, equitable, or supportive of the democratic system by any rigorous measure.

The aim of RedistrictR is to generate more equitable voting districts based on a transparent and rigorous methodology seeking to strengthen the democratic process in an environment where many voters are losing confidence in our electoral system. RedistrictR takes a novel approach to redistricting by deploying an evolutionary algorithm that selects for potential district maps based on their performance on several measurable metrics of equity: vote efficiency gap, compactness, and demographic cluster proximity.

RedistrictR focuses on redistricting in the context of the districts of the San Diego County Board of Supervisors which has been criticized in the past for its homogeneity and self-serving redistricting tactics. Tackling the redistricting issue in San Diego County is an exercise to illustrate that our quantitative approach to redistricting is feasible and can result in measurable improvements over the status quo. We use demographic data from the U.S. Census and historical voting data archived by U.C. Berkeley to divide the county into five districts; one for each of the five seats on the Board of Supervisors.

In San Diego County, there are 1,792 census block groups forming the basis of our redistricting analysis. Our task of dividing these building blocks into 5 districts is one with a vast number of potential solutions. In fact, there are nearly 3x10^1250 possible ways to divide 1,792 objects (census block groups) into 5 distinct groups (districts) - this is almost 15 times the number of atoms estimated to make up the known universe! Prior research studying quantitative redistricting has employed exceedingly powerful supercomputers with over 130 thousad cores. Despite using a cluster with a maximum of 19 nodes, we identified numerous potential district maps that outperformed the status quo on our equity metrics.

RedistrictR introduces a unique metric of equity to the body of research surrounding quantitative redistricting based on identifying similar building blocks through k-means clustering and selecting for district maps that place these similar building blocks in the same districts. This innovation opens the door for significant future advances in the space, such as using other unsupervised machine learning techniques and generally introducing the concept additional equity metrics to the quantitative redistricting space. Further, our promising results despite a severely underpowered computing solution illustrate the value we could add to redistricting research given additional computing resources and time to extend the progress we have already made with RedistrictR.


Using data science to strengthen the democratic process .

Last updated:

December 19, 2017