Measurement of partisan segregation for 180 million U.S. voters using advanced geospatial data science

Partisan segregation among people has important political and social implications. Historically, measurement of partisan segregation has been limited to large geographic areas since researchers usually relied on analyzing data at aggregated levels. In this work, CGA along with the Department of Government Professor Ryan Enos and graduate student Jacob Brown have leveraged advances in geospatial data science to measure partisan segregation down to the level of an individual for 180 million U.S. voters. This work involves the creation of most detailed metrics to analyze partisanship within small geographic units such as neighborhoods or cities. This was done by performing K-Nearest Neigbor (KNN) calculations on each voter using advanced data science techniques. CGA built a custom application on Harvard's High Performance Computing Cluster (HHPCC) to handle this big data analysis.  Several optimization  techniques including Geo-hashing and Index based search were used to make the problem tractable and speed up calculations to 100,000 distance calculations per second. The resultant dataset of 180 billion neighbors (K=1000) was modelled in the GPU based database Omnisci to achieve fast and efficient computing performance.

This work was published and featured in:

Nature

New York Times