The Harvard Center for Geographic Analysis (CGA) maintains the Geotweet Archive, a global record of tweets spanning time, geography, and language. The primary purpose of the Archive is to make a comprehensive collection of geo-located tweets available to the academic research community.
The Archive extends from 2010 to the present and is updated daily. The number of tweets in the collection totals approximately 10 billion, and it is stored on Harvard University’s High Performance Computing (HPC) cluster. The Harvard HPC supports many applications for working with big spatio-temporal datasets, including geospatial tools recently deployed by the CGA: OmniSci Immerse, GeoPandas, and PostGIS.
For more information about the archive and how to acces it please see https://doi.org/10.7910/DVN/3NCMB6.
The scripts for harvesting, extraction and enrichment of Geotweets can be found on our Github here.
Questions/comments on CGA's Geotweet Archive can be send to Devika Kakkar.