Data Science

Network Analysis on Geospatial Big Data in Brazil

Network Analysis is a commonly encountered problem in GIS. Researchers are increasingly working with big geospatial datasets that contain millions of records. At this scale, traditional GIS methods of network analysis fall short and new approaches are needed to analyze the data. In this blog, we describe the procedure we used for calculation of shortest drive distances between 3.5 Million patients and their nearest Hospital in Brazil. There are several tools for calculating the shortest distance calculator; most common among them are...

Read more about Network Analysis on Geospatial Big Data in Brazil

Infogroup US Historical Business Dataset Analysis

This project involved creating geospatial measures for ~2,000 public firms from the Infogroup US Historical Business Dataset. One of the tasks involved calculating the following variables at the census block group level from the dataset for 23 years of data (1997 – 2019).

1. Businesses per office size type (Office_Size_Code)
2. Businesses per sales volume (Location_Sales_Volume_Code)
3. Businesses per employee size(Location_Employee_Size_Code)
4. Businesses per Business_Status_Code
5. Number of establishments. Will be calculated from Year_Established...

Read more about Infogroup US Historical Business Dataset Analysis

Detroit Zoning Analysis

The list of work order tickets for Detroit provided by the researcher was converted into a GIS polygon data set containing 355,500 polygons using Python script in ArcGIS Pro. Polygons were mapped using the string of coordinates found in the “Polygon” field. For the ticket polygons / parcel zoning analysis, PostGIS software was used on...

Read more about Detroit Zoning Analysis

Use of Social Media data to study Climate Change

Harvard CGA joined forces with MIT SUL in 2021 to use social media data to study the effects of climate change on people’s well being. To achieve this objective, we developed the Twitter Sentiment Global Index  (TSGI) dataset, an open dataset for monitoring Subjective Well-Being (SWB) globally. By applying Natural Language Processing techniques to our archive of 10 billion geotagged tweets...

Read more about Use of Social Media data to study Climate Change