The Billion Object Platform (BOP)

With funding from the Sloan Foundation and Harvard Dataverse, the Harvard Center for Geographic Analysis (CGA) has developed a prototype spatio-temporal visualization platform dubbed "the BOP" (Billion Object Platform). The first goal of the BOP is to provide the Dataverse platform with an API-accessible big data exploration tool which can support streaming data. The more general goal is to lower barriers for scholars who wish to access large, streaming, spatio-temporal datasets by addressing a basic limitation of geospatial platforms when it comes to interactive visualization of more than a couple million features.

The first instance of the BOP is loaded with the latest billion geo-tweets, and is fed a real-time stream of about 1 million tweets per day. The geo-tweets are enriched with sentiment and census/admin boundary codes on their way into the system. The system is open source and is currently hosted on Massachusetts Open Cloud (MOC), an OpenStack environment with all components deployed in Docker orchestrated by Kontena. Here is an overview of the BOP architecture which is built on a stack consisting of Apache Lucene, Solr, Kafka, Zookeeper, Swagger, scikit-learn, OpenLayers, and AngularJS. 

To support the interactive visualization of billions of features the CGA added significant new capabilities to the widely used Solr and Lucene libraries in the form of spatial and temporal faceting.

The CGA has been harvesting geo-tweets since 2012 and has developed an archive which contains many billions of tweets. More information on the archive is available here Harvard CGA Geotweet Archive If you are interested in learning more about this archive please contact us

Investigators:  Gary King, Merce Crosas
bop ui