Geocoding

Currently there are many options to choose from when geocoding batches of address data.  The CGA has the most expertise with the Esri and  Google geocoding services.  Please find below instructions and guidance on using these geocoding services.

Esri: Harvard Key holders have access to use Esri's ArcGIS World Geocoding Service, see this tutorial with sample data included to use the ArcGIS World Geocoding Service either within ArcGIS Pro (step 2 of the tutorial) or with ArcGIS Online (step 4).  There's a maximum of about 60,000 addresses per batch. If geocoding more than 60K addresses in the U.S., using a local copy of the geocoder is recommended, see instructions for this below.

Google Maps Platform: This Geocoding API can be used to geocode worldwide addresses after obtaining an API key.  You will need to enter a credit card to set up a billing account with Google, and they will give you $200/month credit for the first 12 months (view their pricing structure).  They will not charge your card until you give them permission to.  See https://cloud.google.com/maps-platform/to get started.  Use this python script to load batches of addresses into the Google Maps Platform.

Important Note:  If addresses cannot be loaded onto a server due to HIPAA compliance or other restrictions, then these online Esri and Google geocoders cannot be used.  To overcome this restriction, a local copy of the 2022 Esri geocoder (for the USA only) can be downloaded for use on a desktop PC running Windows operating system, and ArcGIS Desktop or ArcGIS Pro software.  This 12.8 GB file can be accessed via the "T" drive in the HMDC computer lab  (T:\Geocoding_Esri\Geocoding Data for ArcGIS Pro 2022.zip), or by downloading it at this link after logging in with a Harvard Key. It may be copied to a local or network hard drive, used from a machine in the lab, or mapped using the path \\fas-depts.ad.fas.harvard.edu\cgis\arcgis (it requires the fas_domain account login).  

The DeGauss geocoder is another good option for geocoding U.S. data on one's local system. DeGAUSS runs on Windows, Linux, and Mac operating systems.  See also the very useful address string formatting tips on the DeGauss website.

Geocoding Big Data:  The Esri local geocoders can run millions of records per hour if optimized.  The suggestions below are courtesy of Adam Travis on optimization:

  • Making sure the locator is loaded onto a local drive, not a network drive.
  • Use the singular locators such as "USA_PointAddress.loc" instead of the composite locator "USA.loc".
  • Set the number of threads allocated to 4.
  • Limit the number of candidates to 10.
  • Set a match score of > 85.

More on Geocoding: The ability to assign specific geographic locations to textual information (the process known as geocoding) is available to anyone with a computer and internet access.

Geocoded locations expressed in latitude, longitude coordinates can be obtained one at a time in web maps such as Bing Maps or Google Maps (right click anywhere on the map and choose "What's here"). The relative ease of geocoding and resulting accuracy can vary widely depending on a number of factors. What is the nature of the data?  How ‘clean’ is it and what format is it in?  What geocoding technique will be used?  Determining a geocoding  strategy that best suits a particular need is not always clear.

The process of geocoding begins with comparing data in text or tabular  form to a reference data table in geographic format.  The reference table is a dataset that has already been mapped, with established map coordinates. When matches between the input data and the reference data are found, the corresponding map coordinates are assigned from the reference data to the input features, thus geocoding them.   A geocoding service (also called an address locator) is a program that allows for a user to input a batch of data contained in a table, search for matches as compared to a reference table, and output the result in a map or GIS layer format.  The key to confidently geocoding data lies in understanding the reference table which the data is being matched to, how a match is found, and the resulting spatial accuracy.

See also: Services