#  Geotweet Archive v2.0 

 



 ##  

  expand\_more  

 
  

 

The Harvard Center for Geographic Analysis (CGA) maintains the [Geotweet Archive](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/3NCMB6), a global record of tweets spanning time, geography, and language. The primary purpose of the Archive is to make a comprehensive collection of geo-located tweets available to the academic research community.

The geotweet archive was started by [Todd Mostak and Ben Lewis in 2012](https://cmes.fas.harvard.edu/news/cmes-am-graduate-explores-power-twitter-big-data). The creation of the archive was part of the design and development of GEOPS, the first spatial GPU-powered database, developed by Mostak and Lewis between 2012 and 2013.

With the TweetMap incarnation of GEOPS inside [WorldMap](https://gis.harvard.edu/projects/worldmap), (WorldMap came 5 years before ArcGIS Online), WorldMap became the world's first big vector data mapping platform. [Here](https://www.youtube.com/watch?v=d3doxe7MwP4) is an overview of TweetMap from 2013, and below is a demonstration of instant query and display against 200 million tweets:

Embed



 



The [current archive ](https://doi.org/10.7910/DVN/3NCMB6)extends from October 2012 to July 12, 2023 when Twitter closed access to its free API. Version 2 of the Geotweet archive resulted from a merge of the CGA archive with other archives, most notably one built by Bernd Resch and his team at the University of Salzburg, and one created by Ryan Wang, a Harvard postdoc. The data merge was performed by Devika Jain of CGA and Clemens Havas of the University of Salzburg. When the GEOPs went commercial, (eventually becoming HEAVY.AI), the open source [Billion Object Platform](https://gis.harvard.edu/billion-object-platform-bop) (BOP) was developed.

For more on the history of the Geotweet Archive, TweetMap,[ the BOP](https://gis.harvard.edu/billion-object-platform-bop), GEOPS, HEAVY.AI, and WorldMap, please contact [Ben Lewis](mailto:bglewis@gmail.com).

The number of tweets in the CGA geotweet archive now totals approximately 10 billion, and is stored on [Harvard University’s High Performance Computing (HPC) cluster](https://www.rc.fas.harvard.edu/). Harvard research computing also supports many applications for working with big spatio-temporal datasets, including [these tools maintained by the CGA](https://gis.harvard.edu/geospatial-applications-fasrc).

For more information about the archive and how to access it, click [here](https://doi.org/10.7910/DVN/3NCMB6). Scripts for harvesting, extraction and enrichment of Geotweets can be found on our Github [here](https://github.com/cga-harvard/Data_Science_Big_Data_Projects/tree/master/scripts/Geotweets).

 ![map_tweets_lang](/sites/g/files/omnuum9996/files/gis/files/map_tweets_language.png)

 

Geotweet Archive tweets displayed using HEAVY.AI which originated at CGA as GEOPS.

 



 

 See also:- [ Data ](/research/data)