GEOG 482
The Nature of Geographic Information

7. Geocoding

PrintPrint

Geocoding is the process used to convert location codes, such as street addresses or postal codes, into geographic (or other) coordinates. The terms “address geocoding” and “address mapping” refer to the same process. Geocoding address-referenced population data is one of the Census Bureau’s key responsibilities.  However, as you know, it’s also a very popular capability of online mapping and routing services. In addition, geocoding is an essential element of a suite of techniques that are becoming known as “business intelligence.” We’ll look at applications like these later in this chapter, but first let’s consider how the Census Bureau performs address geocoding.

Address Geocoding at the U.S. Census

Prior to the MAF/TIGER modernization project that led up to the decennial census of 2010, the TIGER database did not include a complete set of point locations for U.S. households. Lacking point locations, TIGER was designed to support address geocoding by approximation. As illustrated below in Figure 4.7.1, the pre-modernization TIGER database included address range attributes for the edges that represent streets. Address range attributes were also included in the TIGER/Line files extracted from TIGER. Coupled with the Start and End nodes bounding each edge, address ranges enable users to estimate locations of household addresses.

Diagram: neighborhood map with addresses (top) & address data being recorded in program window (bottom)
Figure 4.7.1 How address range attributes were encoded in TIGER/Line files (U.S. Census Bureau 1997). Address ranges in contemporary TIGER/Line Shapefiles are similar, except that “From” (FR) and “To” nodes are now called “Start” and “End.” Also, changes have been made to field (column) names in the attribute tables. Compare the names of the address range fields that you looked at in the second Try This exercise to those above.
(U.S. Census Bureau 1997)

Here’s how it works. The diagram above highlights an edge that represents a one-block segment of Oak Avenue. The edge is bounded by two nodes, labeled "Start" and "End." A corresponding record in an attribute table includes the unique ID number (0007654320) that identifies the edge, along with starting and ending addresses for the left (FRADDL, TOADDL) and right (FRADDR, TOADDR) sides of Oak Avenue. Note also that the address ranges include potential addresses, not just existing ones. This is to make sure that the ranges will remain valid as new buildings are constructed along the street.

A common geocoding error occurs when Start and End designations are assigned to the wrong connecting nodes. You may have read in Galdi’s (2005) white paper “Spatial Data Storage and Topology in the Redesigned MAF/TIGER System,” that in MAF/TIGER, “an arbitrary direction is assigned to each edge, allowing designation of one of the nodes as the Start Node, and the other as the End Node” (p. 3). If an edge’s “direction” happens not to correspond with its associated address ranges, a household location may be placed on the wrong side of a street.

Although many local governments in the U.S. have developed their own GIS “land bases” with greater geometric accuracy than pre-modernization TIGER/Line files, similar address geocoding errors still occur. Kathryn Robertson, a GIS Technician with the City of Independence, Missouri (and a student in the Fall 2000 offering of this course) pointed out how important it is that Start (or "From") nodes and End (or "To") nodes correspond with the low and high addresses in address ranges. "I learned this the hard way," she wrote, "geocoding all 5,768 segments for the city of Independence and getting some segments backward. When address matching was done, the locations were not correct. Therefore, I had to go back and look at the direction of my segments. I had a rule of thumb, all east-west streets were to start from west and go east; all north-south streets were to start from the south and go north" (personal communication).

Although this may have been a sensible strategy for the City of Independence, can you imagine a situation in which Kathryn’s rule-of-thumb might not work for another municipality? If so, and if you’re a registered student, please add a comment to Canvas.

After MAF/TIGER Modernization

If TIGER had included accurate coordinate locations for every household, and correspondingly accurate streets and administrative boundaries, geocoding census data would be simple and less error-prone. Many local governments digitize locations of individual housing units when they build GIS land bases for property tax assessment, E-911 dispatch and other purposes. The MAF/TIGER modernization project begun in 2002 aimed to accomplish this for the entire nationwide TIGER database in time for the 2010 census. The illustration below in Figure 4.7.2 shows the intended result of the modernization project, including properly aligned streets, shorelines, and individual household locations, shown here in relation to an orthorectified aerial image.

Image showing modernized TIGER household locations and aligned streets
Figure 4.7.2 Intended accuracy and completeness of modernized TIGER data in relation to the real world. TIGER streets (yellow), shorelines (blue), and housing unit locations (red) are superimposed over an orthorectified aerial image. (U.S. Census Bureau n.d.). National coverage of housing unit locations and geometrically-accurate streets and other features were not available in 2000 or before.
U.S. Census Bureau n.d.

The modernized MAF/TIGER database described by Galdi (2005) is now in use, including precise geographic locations of over 100 million household units. However, because household locations are considered confidential, users of TIGER/Line Shapefiles extracted from the MAF/TIGER database still must rely upon address geocoding using address ranges.

Leveraging TIGER/Line data for Private Enterprise

Launched in 1996, MapQuest was one of the earliest online mapping, geocoding and routing services. MapQuest combined the capabilities of two companies: a cartographic design firm with long experience in producing road atlases, “TripTiks” for the American Automobile Association, and other map products, and a start-up company that specialized in custom geocoding applications for business. Initially, MapQuest relied in part on TIGER/Line street data extracted from the pre-modernization TIGER database. MapQuest and other commercial firms were able to build their businesses on TIGER data because of the U.S. government’s wise decision not to restrict its reuse. It’s been said that this decision triggered the rapid growth of the U.S. geospatial industry.

Later on in this chapter, we’ll visit MapQuest and some of its more recent competitors. Next, however, you'll have a chance to see how geocoding is performed using a TIGER/Line data in a GIS.