Center for Geographic Analysis Harvard University
Additional navigation

You are here

Moving Historical Geodata to the Web

Moving Historical Geodata to the Web
The New York Public Library
Supported by the Alfred P. Sloan Foundation
November 5-7, 2014

[meeting notes by Lex Berman]

PDF:  agenda   participants

Hosted by Matt Knutzen, Geospatial Librarian for NYPL, this meeting built upon the successful digitization of the NYPL map collection and established common ground with other similar institutional projects.  The meeting also connected experts from across the U.S. and Europe who are currently working on the same types of historical geographic information in digital form.   The workshop participants spent two days brain storming, sharing experiences, and proposing practical steps for collaboration.

In his opening remarks, Knutzen described the transformation of more than 20,000 historical map sheets in the NYPL collection into digital images that were subsequently made  available to the general public.   The modular approach taken by Knutzen, involved several pieces of software, and crowd-sourcing of some tasks.   

Each of the pieces of the NYPL transformative workflows to turn paper maps into a variety of searchable resources on the web resulted in important findings.   For example, the Map Warper Tool  was very successful in creating rough-and-ready georectified images of maps.    However, a crowd-sourced feature extraction tool, which enabled volunteers to digitize buildings and other features from the historical maps, proved to be much too time-consuming a process.   This discovery subsequently led to the automated map vectorizer and feature extraction tool.   

The automation of the "first pass" in creating building footprints saved an enormous amount of time, and enabled the creation of the crowd-sourced quality control and data entry tool, called Building Inspector.   The Building Inspector users are asked to submit only a few very simple pieces of information, enabling the rapid collection of a huge number of data enhancements with a minimum possibility of error. 

Overall, the important lessons learned by developing the NYPL historical map tools were that some tasks are more suitable to automation, and some tasks, though automated, benefit from human inspection and correction.  In addition, the working software for Map Warper, Map Vectorizer, and Building Inspector, are all now open source and available for re-use in other contexts.

By bringing together a large group of domain experts, programmers, and researchers in the fields of libraries, museums and digital humanities, this workshop advanced the set of common tasks and inspired multi-party collaborations.   In Matt Knutzen's words:  "making historical spatial data actionable is the part of the venn diagram where all of our interests overlap."

During the meeting a metadata exchange repository was established:

Some of the important questions raised at the meeting were:
How do we define and characterize historical geodata (for our projects)?   
Are annotations about historical features on maps to be treated as equal to features themselves?
Is there a shared set of tools and processes that we can agree on?
How can we create a set of well defined tasks to start building the OpenHistoricalMap?
What are best practices for digitizing and analyzing historical features, such as roads and routes?
How can we crawl metadata in order to clean and enhance our own records?  
How can we de-duplicate and merge metadata records when needed?
How can we segment data to steer search results to the correct part of the original source materials?
What is the pie-in-the-sky goal that would be the perfect solution for "historical geodata?"

Some highlights of the topics discussed were:

1. Collection / Processing (paper to pixels, feature extaction, geocoding)
2. Description (creation of metadata)
3. Publication & Distribution  (should be in interoperable machine readable formats)   
4. Discovery / Analysis / Visualization
5. Preservation & Archiving   

1.  Building from Tim Berners-Lee concept of Five Star Open Data 
2.  Create a set of stars or levels to characterize how much has been done for each resource
3.  For example:  1 Star (scan map), 2 Star (create metadata for map), 3 Star (georectify map), 4 Star (permanent URIs for the map object and metadata), 5 Star (vectorization and RDF of features from map object).

* Finding data
* Accessing a map
* Search spatial data contained within a map
* Create vector data from raster maps
* Export underlying digitized data for research
* Geocode against historical data
* Linking historical information to locations
* Interrelate historical information of different qualities
* Microtask with historical (geo)data
* Publish one's own data in a harvestable way (stakeholder use case)
* Harvest and index data from other sources (stakeholder use case)
* Dealing with non-cartographic elements of maps: symbols / marginalia / colophons
* Text Analysis with NLP (how to derive a vector of space and time from free text or other map derived information?)
* Image Analysis with OCR  (how to automatically detect the variations or differences in historical maps?  either between scans of the same map sheet, or variations & editions of the map?)

1. data creation / NLP
2. description tools
3. data delivery publishing (including conversion tools into various formats)
4. narrative (putting data into context)
5. APIs and mashups
6. visualization
7. annotations, such as shared canvas
8. communicating how to use / teaching & documentation

1. data and applications must be easy to find and easy to use
2. how to facilitate the connections from geography to historical info?
3. provenance (where are the raw data and annotations from, how will it be used?)
4. acknoweldgements to all contributors (raw data, annotations, and code)
5. persistent URIs
6. must be community driven (to survive with or without the support of any one big institution)

A collection of links from the meeting:
Annotorius Image Annotation -
Boston Public Library Leventhal Map Center -
Cooper Hewitt Labs  -
Data Life Cycle Model -
Data in Project Life Cycle -
Five Star Open Data -
GeoBlacklight Design Document -
GeoHumanities Special Interest Group -
GeoNames  -
Getty Thesaurus of Geographic Names as GeoJSON -
LOC Gazetteer -
Metadata Schema for Resource Discovery Use Cases -
Neatline, Plot your course in space and time -
NYPL Building Inspector -
NYPL Map Vectorizer -
NYPL Map Warper -
OGC Cat Interop -
Old Maps Online -
OpenGeoMetadata repository -
OpenGeoPortal -
OpenHistoricalMap -
OpenStreetMap Map Warper:
Orbis Geospatial Network Model -
Past Place API  -
Pelagios -
PeriodO assertions for linking data -
Simple Open Data -
Temporal Gazetteer API  [TGAZ]  -
Temporal Gazetteers Resource Page -
TopoTime Qualitative Reasoning for Historical Time -
UCSB Spatial Search -
UVA Map Scholar  -
WikiData -