Measurement of partisan segregation for 180 million U.S. voters using advanced geospatial data science

Partisan segregation among people has important political and social implications. Historically, such measurements have been limited to county levels but this innovative work enabled Harvard researchers to analyze partisanship down to the level of individuals for the first time. In this work, CGA along with the Department of Government Professor Ryan Enos and graduate student Jacob Brown have leveraged advances in geospatial data science to measure partisan segregation down to the level of an individual for 180 million U.S. voters. This work involves the creation of most detailed metrics to analyze partisanship within small geographic units such as neighborhoods or cities. This was done by performing K-Nearest Neigbor (KNN) calculations on each voter using advanced data science techniques. CGA built a custom application on Harvard's High Performance Computing Cluster (HHPCC) to handle this big data analysis.  Several optimization  techniques including Geo-hashing and Index based search were used to make the problem tractable and speed up calculations to 100,000 distance calculations per second. The resultant dataset of 180 billion neighbors (K=1000) was modelled in the GPU based database Omnisci to achieve fast and efficient computing performance.

This work was published and featured in:
Nature
New York Times

This work is partially funded by NSF award #1841403, OmniSci, Inc. and support from Ryan Enos and Jacob Brown of the Department of Government, Harvard University.

The scripts can be found on our Github here.

Questions/comments on this project can be send to Devika Kakkar and Ben Lewis.