Python for Geospatial Big Data and Data Science Using the FASRC

Lead Instructor: Robert Spang
Co-Instructors: Devika Kakkar and Xiaokang Fu

Objective
This workshop focuses on how to analyze large data sets using Python and FASRC. The workshop covers various tools and techniques used in Data Science and Big Data computations. Attendees are prepared to work with their own data sets and apply their analyses using FASRC.

Topics Covered:
1. Introduction and Fundamentals of High-Performance Computing, with a focus on FASRC
2. Foundations of Data Analysis and Data Science, emphasizing Big Data
3. Concepts of filter/map/reduce, multi-processing, and Apache Spark using Python
4. Practical application using a large social media data set (The GeoTweets Data Set / Twitter Sentiment Geographical Index) to address a sample research question

Target Audience:
This is a workshop for intermediate level Python users; basic Python development experience is required. Participants should be comfortable using Python on their own machines, be able to load and inspect CSV files locally, and use SSH to connect to a remote server. Having some experience with Numpy and Pandas is recommended, but not required. The workshop is suitable for first-time users of HPCs and those interested in Geo-analyses.

GitHub repository

Tutorial videos (4 parts)

See also: Teaching, video