High Performance Computing for Address Level Climate Data Extraction

A key objective of multiple public health researchers the CGA works with is to find ways to improve the health of cohort members by calculating various social and environmental exposures at cohort member address locations. To aid this project objective, the CGA processed daily precipitation, temperature, and humidity estimates for 4,796 cohort address locations for the years 1999 – 2017, resulting in over 73 million patient/days of calculations. Input climate data was the 800-meter resolution PRISM Spatial Climate Dataset for the Conterminous United States (PRISM=Parameter-elevation Relationships on Independent Slopes Model, Oregon State University). The PRISM dataset is published in .BIL raster format, with one raster representing one climate variable per day. Data extraction is executed on Harvard's High-Performance Computing Cluster running a PostgreSQL database with the PostGIS extension. The innovative solution enabled the researchers to analyse this climate data at a fine-grained level with such high efficiency for the first time. The processing time was reduced from months to days, which resulted in higher overall project efficiency. The solution is called Raster INformation eXtraction (RINX).  Links related to RINX:

The code on Github 

A journal publication in ISPRS

Presentation at the 2022 FOSS4G conference

Questions/comments on the project can be sent to Devika Kakkar and  Jeff Blossom