Geospatial data science tools and data on Harvard's High Performance Computing Cluster (HHPCC)

In collaboration with OmniSci Technologies and the NSF Spatiotemporal Innovation Center, and in close coordination with Harvard Research Computing (FASRC), the CGA has deployed OmniSci Immerse and PostGIS on Harvard's large computation cluster, (the CGA served as an incubator for OmniSci in 2012-2013).  It is now possible for researchers across Harvard to access these geospatial data tools flexibly and at low or no cost.    

A presentation on this work given at OmniSci's Virtual Summit by Ben Lewis and Devika Kakkar of CGA, and Raminder Singh of Harvard Research Computing, is here https://www.brighttalk.com/presenting/talk/410124.  You will need to register to view. 

The first dataset to be hosted by CGA on the cluster is the Geotweet Archive.  The CGA has been harvesting and archiving geo-located tweets since 2012, and this dataset has been recently merged with other tweet archives through a collaboration with the University of Salzburg.  The resulting multi-billion record dataset is now within easy reach of a wide variety of data science tools.

The installion scripts can be found on our Github here.

Below is a preliminary exploration of attitudes towards masks using data from the Geo-tweet Archive, to give an idea of visualization capabilities:  

 

This demo gives an idea of how a Harvard scholar can access HPC tools:

Starting up an OmniSci instance on the Harvard HPC

Questions/comments on this project can be send to Devika Kakkar.