big data

Twitter Sentiment Geographical Index (TSGI) dataset: A global high-frequency dataset for monitoring Subjective Well-Being

Introduction
Promoting well-being is one of the key targets of the Sustainable Development Goals at the United Nations. Many national and city governments worldwide are incorporating subjective well-being (SWB) indicators into their agenda to complement traditional objective development and economic metrics. In this study, we develop the Twitter sentiment geographical index (TSGI), a proxy for SWB by applying natural language processing techniques on a comprehensive archive of 7.4 billion geotagged tweets. In contrast to the previous works focusing on SWB,...

Read more about Twitter Sentiment Geographical Index (TSGI) dataset: A global high-frequency dataset for monitoring Subjective Well-Being

Network Analysis on Geospatial Big Data in Brazil

Network Analysis is a commonly encountered problem in GIS. Researchers are increasingly working with big geospatial datasets that contain millions of records. At this scale, traditional GIS methods of network analysis fall short and new approaches are needed to analyze the data. In this blog, we describe the procedure we used for calculation of shortest drive distances between 3.5 Million patients and their nearest Hospital in Brazil. There are several tools for calculating the shortest distance calculator; most common among them are...

Read more about Network Analysis on Geospatial Big Data in Brazil

Infogroup US Historical Business Dataset Analysis

This project involved creating geospatial measures for ~2,000 public firms from the Infogroup US Historical Business Dataset. One of the tasks involved calculating the following variables at the census block group level from the dataset for 23 years of data (1997 – 2019).

1. Businesses per office size type (Office_Size_Code)
2. Businesses per sales volume (Location_Sales_Volume_Code)
3. Businesses per employee size(Location_Employee_Size_Code)
4. Businesses per Business_Status_Code
5. Number of establishments. Will be calculated from Year_Established...

Read more about Infogroup US Historical Business Dataset Analysis

Detroit Zoning Analysis

The list of work order tickets for Detroit provided by the researcher was converted into a GIS polygon data set containing 355,500 polygons using Python script in ArcGIS Pro. Polygons were mapped using the string of coordinates found in the “Polygon” field. For the ticket polygons / parcel zoning analysis, PostGIS software was used on...

Read more about Detroit Zoning Analysis

Use of Social Media data to study Climate Change

Harvard CGA joined forces with MIT SUL in 2021 to use social media data to study the effects of climate change on people’s well being. To achieve this objective, we developed the Twitter Sentiment Global Index  (TSGI) dataset, an open dataset for monitoring Subjective Well-Being (SWB) globally. By applying Natural Language Processing techniques to our archive of 10 billion geotagged tweets...

Read more about Use of Social Media data to study Climate Change

High Performance Computing for Address Level Climate Data Extraction

A key objective of multiple public health researchers the CGA works with is to find ways to improve the health of cohort members by calculating various social and environmental exposures at cohort member address locations. To aid this project objective, the CGA processed daily precipitation, temperature, and humidity estimates for 4,796 cohort address locations for the years 1999 – 2017, resulting in over 73 million patient/days of calculations. Input climate data was the 800-meter resolution...

Read more about High Performance Computing for Address Level Climate Data Extraction

Measurement of partisan segregation for 180 million U.S. voters using advanced geospatial data science

Partisan segregation among people has important political and social implications. Historically, such measurements have been limited to county levels but this innovative work enabled Harvard researchers to analyze partisanship down to the level of individuals for the first time. In this work, CGA along with the Department of Government Professor Ryan Enos and graduate student Jacob Brown have leveraged advances in geospatial data science to measure partisan segregation down to the...

Read more about Measurement of partisan segregation for 180 million U.S. voters using advanced geospatial data science