Geotweet Archive v2.0
The Harvard Center for Geographic Analysis (CGA) maintains the Geotweet Archive, a global record of tweets spanning time, geography, and language. The primary purpose of the Archive is to make a comprehensive collection of geo-located tweets available to the academic research community.
The geotweet archive was started by Todd Mostak and Ben Lewis in 2012. The creation of the archive was part of the design and development of GEOPS, the first spatial GPU-powered database, developed by Mostak and Lewis between 2012 and 2013.
With the TweetMap incarnation of GEOPS inside WorldMap, (WorldMap came 5 years before ArcGIS Online), WorldMap became the world's first big vector data mapping platform. Here is an overview of TweetMap from 2013, and below is a demonstration of instant query and display against 200 million tweets:
The current archive extends from October 2012 to July 12, 2023 when Twitter closed access to its free API. Version 2 of the Geotweet archive resulted from a merge of the CGA archive with other archives, most notably one built by Bernd Resch and his team at the University of Salzburg, and one created by Ryan Wang, a Harvard postdoc. The data merge was performed by Devika Jain of CGA and Clemens Havas of the University of Salzburg. When the GEOPs went commercial, (eventually becoming HEAVY.AI), the open source Billion Object Platform (BOP) was developed.
For more on the history of the Geotweet Archive, TweetMap, the BOP, GEOPS, HEAVY.AI, and WorldMap, please contact Ben Lewis.
The number of tweets in the CGA geotweet archive now totals approximately 10 billion, and is stored on Harvard University’s High Performance Computing (HPC) cluster. Harvard research computing also supports many applications for working with big spatio-temporal datasets, including these tools maintained by the CGA.
For more information about the archive and how to access it, click here. Scripts for harvesting, extraction and enrichment of Geotweets can be found on our Github here.