The Nature of Geographic Information

Chapter 1: Data and Information

1. Overview

When I started writing this text in 1997, my office was across the street (and, fortunately, upwind) from Penn State's power plant. The energy used to heat and cool my office is still produced there by burning natural gas extracted from wells in nearby counties. Combustion transforms the potential energy stored in the gas into electricity, which solves the problem of an office that would otherwise be too cold or too warm. Unfortunately, the solution itself causes another problem, namely emissions of carbon dioxide and other more noxious substances into the atmosphere. Cleaner means of generating electricity exist, of course, but they, too, involve transforming energy from one form to another. And cleaner methods cost more than most of us are willing or able to pay.

It seems to me that a coal-fired power plant is a pretty good analogy for a geographic information system. For that matter, GIS is comparable to any factory or machine that transforms a raw material into something more valuable. Data is grist for the GIS mill. GIS is like the machinery that transforms the data into the commodity--information--that is needed to solve problems or create opportunities. And the problems that the manufacturing process itself creates include uncertainties resulting from imperfections in the data, intentional or unintentional misuse of the machinery, and ethical issues related to what the information is used for, and who has access to it.

This text explores the nature of geographic information. To study the nature of something is to investigate its essential characteristics and qualities. To understand the nature of the energy produced in a coal-fired power plant, one should study the properties, morphology, and geographic distribution of coal. By the same reasoning, I believe that a good approach to understanding the information produced by GIS is to investigate the properties of geographic data and the technologies and institutions that produce it.

Objectives

The goal of Chapter 1 is to situate GIS in a larger enterprise known as Geographic Information Science and Technology (GIS&T), and in what the U.S. Department of Labor calls the "geospatial industry." In particular, students who successfully complete Chapter 1 should be able to:

define a geographic information system;
recognize and name basic database operations from verbal descriptions;
recognize and name basic approaches to geographic representation from verbal descriptions;
identify and explain at least three distinguishing properties of geographic data; and
outline the kinds of questions that GIS can help answer.

"Try This!" Activities

Take a minute to complete any of the Try This activities that you encounter throughout the chapter. These are fun, thought-provoking exercises to help you better understand the ideas presented in the chapter.

This textbook is used as a resource in Penn State's Online Geospatial Education online degree and certificate programs. If this topic is interesting to you and you want to learn more about online GIS and GEOINT education at Penn State, check out our Geospatial Education Program Office.

2. Data

"After more than 30 years, we're still confronted by the same major challenge that GIS professionals have always faced: You must have good data. And good data are expensive and difficult to create." (Wilson, 2001, p. 54)

Data consist of symbols that represent measurements of phenomena. People create and study data as a means to help understand how natural and social systems work. Such systems can be hard to study because they're made up of many interacting phenomena that are often difficult to observe directly and because they tend to change over time. We attempt to make systems and phenomena easier to study by measuring their characteristics at certain times. Because it's not practical to measure everything, everywhere, at all times, we measure selectively. How accurately data reflect the phenomena they represent depends on how, when, where, and what aspects of the phenomena were measured. All measurements, however, contain a certain amount of error.

Measurements of the locations and characteristics of phenomena can be represented with several different kinds of symbols. For example, pictures of the land surface, including photographs and maps, are made up of graphic symbols. Verbal descriptions of property boundaries are recorded on deeds using alphanumeric symbols. Locations determined by satellite positioning systems are reported as pairs of numbers called coordinates. As you probably know, all of these different types of data--pictures, words, and numbers--can be represented in computers in digital form. Obviously, digital data can be stored, transmitted, and processed much more efficiently than their physical counterparts that are printed on paper. These advantages set the stage for the development and widespread adoption of GIS.

3. Information

Information is data that has been selected or created in response to a question. For example, the location of a building or a route is data, until it is needed to dispatch an ambulance in response to an emergency. When used to inform those who need to know, "Where is the emergency, and what's the fastest route between here and there?" the data are transformed into information. The transformation involves the ability to ask the right kind of question, and the ability to retrieve existing data--or to generate new data from the old--that help people answer the question. The more complex the question and the more locations involved, the harder it becomes to produce timely information with paper maps alone.

Interestingly, the potential value of data is not necessarily lost when they are used. Data can be transformed into information again and again, provided that the data are kept up to date. Given the rapidly increasing accessibility of computers and communications networks in the U.S. and abroad, it's not surprising that information has become a commodity, and that the ability to produce it has become a major growth industry.

4. Information Systems

Information systems are computer-based tools that help people transform data into information.

As you know, many of the problems and opportunities faced by government agencies, businesses, and other organizations are so complex, and involve so many locations, that the organizations need assistance in creating useful and timely information. That's what information systems are for.

Allow me a fanciful example. Suppose that you've launched a new business that manufactures solar-powered lawn mowers. You're planning a direct mail campaign to bring this revolutionary new product to the attention of prospective buyers. But, since it's a small business, you can't afford to sponsor coast-to-coast television commercials or to send brochures by mail to more than 100 million U.S. households. Instead, you plan to target the most likely customers - those who are environmentally conscious, have higher than average family incomes, and who live in areas where there is enough water and sunshine to support lawns and solar power.

Fortunately, lots of data are available to help you define your mailing list. Household incomes are routinely reported to banks and other financial institutions when families apply for mortgages, loans, and credit cards. Personal tastes related to issues like the environment are reflected in behaviors such as magazine subscriptions and credit card purchases. Firms like Claritas amass such data and transform it into information by creating "lifestyle segments" - categories of households that have similar incomes and tastes. Your solar lawnmower company can purchase lifestyle segment information by 5-digit ZIP code, or even by ZIP+4 codes, which designate individual households.

It's astonishing how companies like Claritas, Experian, and Esri can create valuable information from the millions upon millions of transactions that are recorded every day. Their "lifestyle segmentation" data products are made possible by the fact that the original data exist in digital form, and because the companies have developed information systems that enable them to transform the data into information that marketers value. The fact that lifestyle information products are often delivered by geographic areas, such as ZIP codes, speaks to the appeal of geographic information systems.

Try This!

How does your ZIP code look to marketers?

Lifestyle segmentation data cluster similar households into lifestyle categories - “segments” - that marketers can use to target advertising. Lifestyle segments have evocative names like “Gen X Urban,” “Senior Styles,” and “Rustic Outposts." For example, according to Esri’s Tapestry Segmentation, the predominant lifestyle groups in my ZIP code are Down the Road, Soccer Moms, and Exurbanites.

You can use Esri’s ZIP Lookup to see how your ZIP code is segmented. Do the lifestyle segments seem accurate for your community? If you don't live in the United States, try Penn State's Zip code, 16802.

5. Databases, Mapping, and GIS

One of our objectives in this first chapter is to be able to define a geographic information system. Here's a tentative definition: A GIS is a computer-based tool used to help people transform geographic data into geographic information.

The definition implies that a GIS is somehow different from other information systems, and that geographic data are different from non-geographic data. Let's consider the differences next.

6. Database Management Systems

Claritas and similar companies use database management systems (DBMS) to create the "lifestyle segments" that I referred to in the previous section. Basic database concepts are important since GIS incorporates much of the functionality of DBMS.

Digital data are stored in computers as files. Often, data are arrayed in tabular form. For this reason, data files are often called tables. A database is a collection of tables. Businesses and government agencies that serve large clienteles, such as telecommunications companies, airlines, credit card firms, and banks, rely on extensive databases for their billing, payroll, inventory, and marketing operations. Database management systems are information systems that people use to store, update, and analyze non-geographic databases.

Often, data files are tabular in form, composed of rows and columns. Rows, also known as records, correspond with individual entities, such as customer accounts. Columns correspond with the various attributes associated with each entity. The attributes stored in the accounts database of a telecommunications company, for example, might include customer names, telephone numbers, addresses, current charges for local calls, long distance calls, taxes, etc.

Geographic data are a special case: records correspond with places, not people or accounts. Columns represent the attributes of places. The data in the following table, for example, consist of records for Pennsylvania counties. Columns contain selected attributes of each county, including the county's ID code, name, and 1980 population.

1980 Population Data for PA Counties
FIPS Code	County	1980 Pop
42001	Adams County	78274
42003	Allegheny County	1336449
42005	Armstrong County	73478
42007	Beaver County	186093
42009	Bedford County	47919
42011	Berks County	336523
42013	Blair County	130542
42015	Bradford County	60967
42017	Bucks County	541174
42019	Butler County	152013
42021	Cambria County	163062
42023	Cameron County	5913
42025	Carbon County	56846
42027	Centre County	124812

Table 1.1: The contents of one file in a database.

The example is a very simple file, but many geographic attribute databases are in fact very large (the U.S. is made up of over 3,000 counties, almost 50,000 census tracts, about 43,000 five-digit ZIP code areas and many tens of thousands more ZIP+4 code areas). Large databases consist not only of lots of data, but also lots of files. Unlike a spreadsheet, which performs calculations only on data that are present in a single document, database management systems allow users to store data in, and retrieve data from, many separate files. For example, suppose an analyst wished to calculate population change for Pennsylvania counties between the 1980 and 1990 censuses. More than likely, 1990 population data would exist in a separate file, like so:

1990 Population Data for PA Counties
FIPS Code	1990 Pop
42001	84921
42003	1296037
42005	73872
42007	187009
42009	49322
42011	352353
42013	131450
42015	62352
42017	578715
42019	167732
42021	158500
42023	5745
42025	58783
42027	131489

Table 1.2: Another file in a database. A database management system (DBMS) can relate this file to the prior one illustrated above because they share the list of attributes called "FIPS Code."

If two data files have at least one common attribute, a DBMS can combine them in a single new file. The common attribute is called a key. In this example, the key was the county FIPS code (FIPS stands for Federal Information Processing Standard). The DBMS allows users to produce new data as well as to retrieve existing data, as suggested by the new "% Change" attribute in the table below.

Percent Change in Populations for PA Counties 1980-1990
FIPS	County	1980	1990	% Change
42001	Adams	78274	84921	8.5
42003	Allegheny	1336449	1296037	-3
42005	Armstrong	73478	73872	0.5
42007	Beaver	186093	187009	0.5
42009	Bedford	47919	49322	2.9
42011	Berks	336523	352353	4.7
42013	Blair	130542	131450	0.7
42015	Bradford	60967	62352	2.3
42017	Bucks	541174	578715	6.9
42019	Butler	152013	167732	10.3
42021	Cambria	163062	158500	-2.8
42023	Cameron	5913	5745	-2.8
42025	Carbon	56846	58783	3.4
42027	Centre	124812	131489	5.3

Table 1.3: A new file produced from the prior two files as a result of two database operations. One operation merged the contents of the two files without redundancy. A second operation produced a new attribute--"% Change"--dividing the difference between "1990 Pop" and "1980 Pop" by "1980 Pop" and expressing the result as a percentage.

Database management systems are valuable because they provide secure means of storing and updating data. Database administrators can protect files so that only authorized users can make changes. DBMS provide transaction management functions that allow multiple users to edit the database simultaneously. In addition, DBMS also provide sophisticated means to retrieve data that meet user specified criteria. In other words, they enable users to select data in response to particular questions. A question that is addressed to a database through a DBMS is called a query.

Database queries include basic set operations, including union, intersection, and difference. The product of a union of two or more data files is a single file that includes all records and attributes, without redundancy. An intersection produces a data file that contains only records present in all files. A difference operation produces a data file that eliminates records that appear in both original files. (Try drawing Venn diagrams--intersecting circles that show relationships between two or more entities--to illustrate the three operations. Then compare your sketch to the venn diagram example. ) All operations that involve multiple data files rely on the fact that all files contain a common key. The key allows the database system to relate the separate files. Databases that contain numerous files that share one or more keys are called relational databases. Database systems that enable users to produce information from relational databases are called relational database management systems.

A common use of database queries is to identify subsets of records that meet criteria established by the user. For example, a credit card company may wish to identify all accounts that are 30 days or more past due. A county tax assessor may need to list all properties not assessed within the past 10 years. Or the U.S. Census Bureau may wish to identify all addresses that need to be visited by census takers, because census questionnaires were not returned by mail. DBMS software vendors have adopted a standardized language called SQL (Structured Query Language) to pose such queries.

7. Mapping Systems

GIS (geographic information systems) arose out of the need to perform spatial queries on geographic data. A spatial query requires knowledge of locations as well as attributes. For example, an environmental analyst might want to know which public drinking water sources are located within one mile of a known toxic chemical spill. Or, a planner might be called upon to identify property parcels located in areas that are subject to flooding. To accommodate geographic data and spatial queries, database management systems need to be integrated with mapping systems. Until about 1990, most maps were printed from handmade drawings or engravings. Geographic data produced by draftspersons consisted of graphic marks inscribed on paper or film. To this day, most of the lines that appear on topographic maps published by the U.S. Geological Survey were originally engraved by hand. The place names shown on the maps were affixed with tweezers, one word at a time. Needless to say, such maps were expensive to create and to keep up to date. Computerization of the mapmaking process had obvious appeal.

Computer-aided design (CAD) CAD systems were originally developed for engineers, architects, and other design professionals who needed more efficient means to create and revise precise drawings of machine parts, construction plans, and the like. In the 1980s, mapmakers began to adopt CAD in place of traditional map drafting. CAD operators encode the locations and extents of roads, streams, boundaries, and other entities by tracing maps mounted on electronic drafting tables, or by key-entering location coordinates, angles, and distances. Instead of graphic features, CAD data consist of digital features, each of which is composed of a set of point locations. Calculations of distances, areas, and volumes can easily be automated once features are digitized. Unfortunately, CAD systems typically do not encode data in forms that support spatial queries. In 1988, a geographer named David Cowen illustrated the benefits and shortcomings of CAD for spatial decision making. He pointed out that a CAD system would be useful for depicting the streets, property parcel boundaries, and building footprints of a residential subdevelopment. A CAD operator could point to a particular parcel, and highlight it with a selected color or pattern. "A typical CAD system," Cowen observed, "could not automatically shade each parcel based on values in an assessor's database containing information regarding ownership, usage, or value, however." A CAD system would be of limited use to someone who had to make decisions about land use policy or tax assessment.

Desktop mapping An evolutionary stage in the development of GIS, desktop mapping systems like Atlas*GIS combined some of the capabilities of CAD systems with rudimentary linkages between location data and attribute data. A desktop mapping system user could produce a map in which property parcels are automatically colored according to various categories of property values, for example. Furthermore, if property value categories were redefined, the map's appearance could be updated automatically. Some desktop mapping systems even supported simple queries that allow users to retrieve records from a single attribute file. Most real-world decisions require more sophisticated queries involving multiple data files. That's where real GIS comes in.

Geographic information systems (GIS) As stated earlier, information systems assist decision makers by enabling them to transform data into useful information. GIS specializes in helping users transform geographic data into geographic information. David Cowen (1988) defined GIS as a decision support tool that combines the attribute data handling capabilities of relational database management systems with the spatial data handling capabilities of CAD and desktop mapping systems. In particular, GIS enables decision makers to identify locations or routes whose attributes match multiple criteria, even though entities and attributes may be encoded in many different data files.

Innovators in many fields, including engineers, computer scientists, geographers, and others, started developing digital mapping and CAD systems in the 1950s and 60s. One of the first challenges they faced was to convert the graphical data stored on paper maps into digital data that could be stored in, and processed by, digital computers. Several different approaches to representing locations and extents in digital form were developed. The two predominant representation strategies are known as "vector" and "raster."

8. Representation Strategies for Mapping

Recall that data consist of symbols that represent measurements. Digital geographic data are encoded as alphanumeric symbols that represent locations and attributes of locations measured at or near Earth's surface. No geographic data set represents every possible location, of course. The Earth is too big, and the number of unique locations is too great. In much the same way that public opinion is measured through polls, geographic data are constructed by measuring representative samples of locations. And just as serious opinion polls are based on sound principles of statistical sampling, so, too, do geographic data represent reality by measuring carefully chosen samples of locations. Vector and raster data are, at essence, two distinct sampling strategies.

The vector approach involves sampling locations at intervals along the length of linear entities (like roads), or around the perimeter of areal entities (like property parcels). When they are connected by lines, the sampled points form line features and polygon features that approximate the shapes of their real-world counterparts.

Illustration of vector encoding of a reservoir and highway

Figure 1.9.1 Two frames (the first and last) of an animation showing the construction of a vector representation of a reservoir and highway.

Try This!

Click the graphic above (Figure 1.9.1) to download and view the animation file (vector.avi, 1.6 Mb) in a separate Microsoft Media Player window.

To download and view the same animation in QuickTime format (vector.mov, 1.6 Mb), click here. Requires QuickTime, which is available free at apple.com.

The aerial photograph above (Figure 1.9.1) shows two entities, a reservoir and a highway. The graphic above right illustrates how the entities might be represented with vector data. The small squares are nodes: point locations specified by latitude and longitude coordinates. Line segments connect nodes to form line features. In this case, the line feature colored red represents the highway. Series of line segments that begin and end at the same node form polygon features. In this case, two polygons (filled with blue) represent the reservoir.

The vector data model is consistent with how surveyors measure locations at intervals as they traverse a property boundary. Computer-aided drafting (CAD) software used by surveyors, engineers, and others, stores data in vector form. CAD operators encode the locations and extents of entities by tracing maps mounted on electronic drafting tables, or by key-entering location coordinates, angles, and distances. Instead of graphic features, CAD data consist of digital features, each of which is composed of a set of point locations.

The vector strategy is well suited to mapping entities with well-defined edges, such as highways or pipelines or property parcels. Many of the features shown on paper maps, including contour lines, transportation routes, and political boundaries, can be represented effectively in digital form using the vector data model.

The raster approach involves sampling attributes at fixed intervals. Each sample represents one cell in a checkerboard-shaped grid.

Illustration of raster encoding of a reservoir and highway

Figure 1.9.2 Two frames (the first and last) of an animation showing the construction of a raster representation of a reservoir and highway.

Try This!

Click the graphic above (Figure 1.9.2) to download and view the animation file (raster.avi, 0.8 Mb) in a separate Microsoft Media Player window.

To download and view the same animation in QuickTime format (raster.mov, 0.6 Mb), click here. Requires QuickTime, which is available free at apple.com.

The graphic above (Figure 1.9.2) illustrates a raster representation of the same reservoir and highway as shown in the vector representation. The area covered by the aerial photograph has been divided into a grid. Every grid cell that overlaps one of the two selected entities is encoded with an attribute that associates it with the entity it represents. Actual raster data would not consist of a picture of red and blue grid cells, of course; they would consist of a list of numbers, one number for each grid cell, each number representing an entity. For example, grid cells that represent the highway might be coded with the number "1" and grid cells representing the reservoir might be coded with the number "2."

The raster strategy is a smart choice for representing phenomena that lack clear-cut boundaries, such as terrain elevation, vegetation, and precipitation. Digital airborne imaging systems, which are replacing photographic cameras as primary sources of detailed geographic data, produce raster data by scanning the Earth's surface pixel by pixel and row by row.

Both the vector and raster approaches accomplish the same thing: they allow us to caricature the Earth's surface with a limited number of locations. What distinguishes the two is the sampling strategies they embody. The vector approach is like creating a picture of a landscape with shards of stained glass cut to various shapes and sizes. The raster approach, by contrast, is more like creating a mosaic with tiles of uniform size. Neither is well suited to all applications, however. Several variations on the vector and raster themes are in use for specialized applications, and the development of new object-oriented approaches is underway.

9. Automated Map Analysis

As I mentioned earlier, the original motivation for developing computer mapping systems was to automate the map making process. Computerization has not only made map making more efficient, it has also removed some of the technological barriers that used to prevent people from making maps themselves. What used to be an arcane craft practiced by a few specialists has become a "cloud" application available to any networked computer user. When I first started writing this text in 1997, my example was the mapping extension included in Microsoft Excel 97, which made creating a simple map as easy as creating a graph. Seventeen years later, who hasn't used Google Maps or MapQuest?

As much as computerization has changed the way maps are made, it has had an even greater impact on how maps can be used. Calculations of distance, direction, and area, for example, are tedious and error-prone operations with paper maps. Given a digital map, such calculations can easily be automated. Those who are familiar with CAD systems know this from first-hand experience. Highway engineers, for example, rely on aerial imagery and digital mapping systems to estimate project costs by calculating the volumes of rock that need to be excavated from hillsides and filled into valleys.

The ability to automate analytical tasks not only relieves tedium and reduces errors; it also allows us to perform tasks that would otherwise seem impractical. Consider, for example, if you were asked to plot on a map a 100-meter-wide buffer zone surrounding a protected stream. If all you had to work with was a paper map, a ruler, and a pencil, you might have a lengthy job on your hands. You might draw lines scaled to represent 100 meters, perpendicular to the river on both sides, at intervals that vary in frequency with the sinuosity of the stream. Then you might plot a perimeter that connects the end points of the perpendicular lines. If your task was to create hundreds of such buffer zones, you might conclude that automation is a necessity, not just a luxury.

Illustration showing construction of a
100-meter buffer polygon around a stream

Figure 1.10.1 Surrounding a protected stream with a buffer polygon.

Some tasks can be implemented equally well in either vector- or raster- oriented mapping systems. Other tasks are better suited to one representation strategy or another. The calculation of slope, for example, or of gradient--the direction of the maximum slope along a surface--is more efficiently accomplished with raster data. The slope of one raster grid cell may be calculated by comparing its elevation to the elevations of the eight cells that surround it. Raster data are also preferred for a procedure called viewshed analysis that predicts which portions of a landscape will be in view, or hidden from view, from a particular perspective.

Some mapping systems provide ways to analyze attribute data as well as locational data. For example, the Excel mapping extension I mentioned above links the geographic data display capabilities of a mapping system with the data analysis capabilities of a spreadsheet. As you probably know, spreadsheets like Excel let users perform calculations on individual fields, columns, or entire files. A value changed in one field automatically changes values throughout the spreadsheet. Arithmetic, financial, statistical, and even certain database functions are supported. But as useful as spreadsheets are, they were not engineered to provide secure means of managing and analyzing large databases that consist of many related files, each of which is the responsibility of a different part of an organization. A spreadsheet is not a DBMS. And, by the same token, a mapping system is not a GIS.

10. Geographic Information Systems

The preceding discussion leads me to revise my working definition:

As I mentioned earlier, a geographer named David Cowen defined GIS as a decision-support tool that combines the capabilities of a relational database management system with the capabilities of a mapping system (1988). Cowen cited an earlier study by William Carstensen (1986), who sought to establish criteria by which local governments might choose among competing GIS products. Carstensen chose site selection as an example of the kind of complex task that many organizations seek to accomplish with GIS. Given the necessary database, he advised local governments to expect that a fully functional GIS should be able to identify property parcels that are:

at least five acres in size;
vacant or for sale;
zoned commercial;
not subject to flooding;
located not more than one mile from a heavy duty road; and
situated on terrain whose maximum slope is less than ten percent.

The first criterion--identifying parcels five acres or more in size--might require two operations. As described earlier, a mapping system ought to be able to calculate automatically the area of a parcel. Once the area is calculated and added as a new attribute into the database, an ordinary database query could produce a list of parcels that satisfy the size criterion. The parcels on the list might also be highlighted on a map, as in Figure 1.11.1, below.

Map of property parcels five acres or larger in Ontario, California

Figure 1.11.1 The cartographic result of a database query identifying all property parcels greater than or equal to five acres in size.

Credit: City of Ontario, CA, GIS Department. Used by permission.

The ownership status of individual parcels would be an attribute of a property database maintained by a local tax assessor's office. Parcels whose ownership status attribute value matched the criteria "vacant" or "for sale" could be identified through another ordinary database query.

Map of property parcels zoned commercial in Ontario, California

Figure 1.11.2 The cartographic result of a spatial intersection (or map overlay) operation identifying all property parcels zoned for commercial (C-1) development.

Credit: City of Ontario, CA, GIS Department. Used by permission.

Carstensen's third criterion was to determine which parcels were situated within areas zoned for commercial development. This would be simple if authorized land uses were included as an attribute in the community's property parcel database. This is unlikely to be the case, however, since zoning and taxation are the responsibilities of different agencies. Typically, parcels and land use zones exist as separate paper maps. If the maps were prepared at the same scale, and if they accounted for the shape of the Earth in the same manner, then they could be superimposed one over another on a light table. If the maps let enough light through, parcels located within commercial zones could be identified.

The GIS approach to a task like this begins by digitizing the paper maps, and by producing corresponding attribute data files. Each digital map and attribute data file is stored in the GIS separately, like separate map layers. A fully functional GIS would then be used to perform a spatial intersection that is analogous to the overlay of the paper maps. Spatial intersection, otherwise known as map overlay, is one of the defining capabilities of GIS.

Map of property parcels within one mile buffer of a highway in Ontario, California

Figure 1.11.3 The cartographic result of a buffer operation identifying all property parcels located within a specified distance of a specified type of highway.

Credit: City of Ontario, CA, GIS Department. Used by permission.

Another of Carstensen's criteria was to identify parcels located within one mile of a heavy-duty highway. Such a task requires a digital map and associated attributes produced in such a way as to allow heavy-duty highways to be differentiated from other geographic entities. Once the necessary database is in place, a buffer operation can be used to create a polygon feature whose perimeter surrounds all "heavy duty highway" features at the specified distance. A spatial intersection is then performed, isolating the parcels within the buffer from those outside the buffer.

To produce a final list of parcels that meet all the site selection criteria, the GIS analyst might perform an intersection operation that creates a new file containing only those records that are present in all the other intermediate results.

Map showing parcels that meet all search criteria in Ontario, California

Figure 1.11.4 The cartographic result of the intersection of Figures 1.11.1, 1.11.2 and 1.11.3. Only the parcels shown in this map satisfy all of the site selection criteria.

Credit: City of Ontario, CA, GIS Department. Used by permission.

I created the maps shown above in 1998 using the Geographic Information Web Server of the City of Ontario, California. Although it is no longer supported, the City of Ontario was one of the first of its kind to provide much of the functionality required to perform a site suitability analysis online. Today, many local governments offer similar Internet map services to current and prospective taxpayers.

Try This!

Find an online site selection utility similar to the one formerly provided by the City of Ontario.

11. Geographic Information Science and Technology

So far in this chapter, I've tried to make sense of GIS in relation to several information technologies, including database management, computer-aided design, and mapping systems. At this point, I'd like to expand the discussion to consider GIS as one element in a much larger field of study called "Geographic Information Science and Technology" (GIS&T). As shown in the following illustration, GIS&T encompasses three subfields including:

Geographic Information Science, the multidisciplinary research enterprise that addresses the nature of geographic information and the application of geospatial technologies to basic scientific questions;
Geospatial Technology, the specialized set of information technologies that support acquisition, management, analysis, and visualization of geo-referenced data, including the Global Navigation Satellite System (GPS and others), satellite, airborne, and shipboard remote sensing systems; and GIS and image analysis software tools; and
Applications of GIS&T, the increasingly diverse uses of geospatial technology in government, industry, and academia.This is the subfield in which most GIS professionals work.

Arrows in the diagram below (Figure 1.12.1) reflect relationships among the three subfields, as well as to numerous other fields, including Geography, Landscape Architecture, Computer Science, Statistics, Engineering, and many others. Each of these fields has influenced, and some have been influenced by, the development of GIS&T. It is important to note that these fields and subfields do not neatly correspond with professions like GIS analyst, photogrammetrist, or land surveyor. Rather, GIS&T is a nexus of overlapping professions that differ in backgrounds, disciplinary allegiances, and regulatory status.

Diagram shows components of the field of GIS and Technology& its relations to other fields, see text description in link below

Figure 1.12.1 The field of Geographic Information Science and Technology (GIS&T) and its relations to other fields. Two-way relations that are half-dashed represent asymmetrical contributions between allied fields.

Click Here for Text Alternative for Figure 1.12.1

Related Disciplines to Geographic Information Science and Technology:

Philosophy
Psychology
Mathematics
Statistics
Computer Science
Geography
Information Science & Technology
Geospatial Technology
Engineering
Landscape Architecture
Various Application Domains

The illustration in Figure 1.12.1, above, first appeared in the Geographic Information Science and Technology Body of Knowledge (DiBiase, DeMers, Johnson, Kemp, Luck, Plewe, and Wentz, 2006), published by the University Consortium for Geographic Information Science (UCGIS) and the Association of American Geographers (AAG) in 2006. The Body of Knowledge is a community-developed inventory of the knowledge and skills that define the GIS&T field. Like the bodies of knowledge developed in Computer Science and other fields, the GIS&T BoK represents the GIS&T knowledge domain as a hierarchical list of knowledge areas, units, topics, and educational objectives. The ten knowledge areas and 73 units that make up the first edition are shown in the table below. Twenty-six “core” units (those in which all graduates of a degree or certificate program should be able to demonstrate some level of mastery) are shown in bold type. Not shown are the 329 topics that make up the units, or the 1,660 education objectives by which topics are defined. These appear in the full text of the GIS&T BoK. The full text of the first edition can be found here: GIST Body of Knowledge PDF. An important related work produced by the U.S. Department of Labor is, however. We'll take a look at that shortly.

Knowledge Areas and Units Comprising the 1st Edition of the GIS&T BoK

Knowledge Area AM. Analytical Methods
- Unit AM1 Academic and analytical origins
- Unit AM2 Query operations and query languages
- Unit AM3 Geometric measures
- Unit AM4 Basic analytical operations
- Unit AM5 Basic analytical methods
- Unit AM6 Analysis of surfaces
- Unit AM7 Spatial statistics
- Unit AM8 Geostatistics
- Unit AM9 Spatial regression and econometrics
- Unit AM10 Data mining
- Unit AM11 Network analysis
- Unit AM12 Optimization and location
- allocation modeling
Knowledge Area CF. Conceptual Foundations
- Unit CF1 Philosophical foundations
- Unit CF2 Cognitive and social foundations
- Unit CF3 Domains of geographic information
- Unit CF4 Elements of geographic information
- Unit CF5 Relationships
- Unit CF6 Imperfections in geographic information
Knowledge Area CV. Cartography and Visualization
- Unit CV1 History and trends
- Unit CV2 Data considerations
- Unit CV3 Principles of map design
- Unit CV4 Graphic representation techniques
- Unit CV5 Map production
- Unit CV6 Map use and evaluation
Knowledge Area DA. Design Aspects
- Unit DA1 The scope of GI S&T system design
- Unit DA2 Project definition
- Unit DA3 Resource planning
- Unit DA4 Database design
- Unit DA5 Analysis design
- Unit DA6 Application design
- Unit DA7 System implementation
Knowledge Area DM. Data Modeling
- Unit DM1 Basic storage and retrieval structures
- Unit DM2 Database management systems
- Unit DM3 Tessellation data models
- Unit DM4 Vector and object data models
- Unit DM5 Modeling 3D, temporal, and uncertain phenomena
Knowledge Area DN. Data Manipulation
- Unit DN1 Representation transformation
- Unit DN2 Generalization and aggregation
- Unit DN3 Transaction management of geospatial data
Knowledge Area GC. Geocomputation
- Unit GC1 Emergence of geocomputation
- Unit GC2 Computational aspects and neurocomputing
- Unit GC3 Cellular Automata (CA) models
- Unit GC4 Heuristics
- Unit GC5 Genetic algorithms (GA)
- Unit GC6 Agent-based models
- Unit GC7 Simulation modeling
- Unit GC8 Uncertainty
- Unit GC9 Fuzzy sets
Knowledge Area GD. Geospatial Data
- Unit GD1 Earth geometry
- Unit GD2 Land partitioning systems
- Unit GD3 Georeferencing systems
- Unit GD4 Datums
- Unit GD5 Map projections
- Unit GD6 Data quality
- Unit GD7 Land surveying and GPS
- Unit GD8 Digitizing
- Unit GD9 Field data collection
- Unit GD10 Aerial imaging and photogrammetry
- Unit GD11 Satellite and shipboard remote sensing
- Unit GD12 Metadata, standards, and infrastructures
Knowledge Area GS. GIS&T and Society
- Unit GS1 Legal aspects
- Unit GS2 Economic aspects
- Unit GS3 Use of geospatial information in the public sector
- Unit GS4 Geospatial information as property
- Unit GS5 Dissemination of geospatial information
- Unit GS6 Ethical aspects of geospatial information and technology
- Unit GS7 Critical GIS
Knowledge Area OI. Organizational and Institutional Aspects
- Unit OI1 Origins of GI S&T
- Unit O2 Managing the GI system operations and infrastructure
- Unit OI3 Organizational structures and procedures
- Unit OI4 GI S&T workforce themes
- Unit OI5 Institutional and inter-institutional aspects
- Unit OI6 Coordinating organizations (national and international)

Ten knowledge areas and 73 units comprising the 1st edition of the GIS&T BoK. Core units are indicated with bold type. (©2006 Association of American Geographers and University Consortium for Geographic Information Science. Used by permission. All rights reserved.)

Notice that the knowledge area that includes the most core units is GD: Geospatial Data. This text focuses on the sources and distinctive characteristics of geographic data. This is one part of the knowledge base that most successful geospatial professionals possess. The Department of Labor's Geospatial Technology Competency Model (GTCM) highlights this and other essential elements of the geospatial knowledge base. We'll consider it next.

12. Geospatial Competencies and Our Curriculum

A body of knowledge is one way to think about the GIS&T field. Another way is as an industry made up of agencies and firms that produce and consume goods and services, generate sales and (sometimes) profits, and employ people. In 2003, the U.S. Department of Labor (DoL) identified "geospatial technology" as one of 14 "high growth" technology industries, along with biotech, nanotech, and others. However, the DoL also observed that the geospatial technology industry was ill-defined, and poorly understood by the public.

Subsequent efforts by the DoL and other organizations helped to clarify the industry's nature and scope. Following a series of "roundtable" discussions involving industry thought leaders, the Geospatial Information Technology Association (GITA) and the Association of American Geographers (AAG) submitted the following "concensus" definition to DoL in 2006:

The geospatial industry acquires, integrates, manages, analyzes, maps, distributes, and uses geographic, temporal, and spatial information and knowledge. The industry includes basic and applied research, technology development, education, and applications to address the planning, decision making, and operational needs of people and organizations of all types.

In addition to the proposed industry definition, the GITA and AAG report recommended that DoL establish additional occupations in recognition of geospatial industry workforce activities and needs. At the time, the existing geospatial occupations included only Surveyors, Surveying Technicians, Mapping Technicians, and Cartographers and Photogrammetrists. Late in 2009, with input from the GITA, AAG, and other stakeholders, the DoL established six new geospatial occupations: Geospatial Information Scientists and Technologists, Geographic Information Systems Technicians, Remote Sensing Scientists and Technologists, Remote Sensing Technicians, Precision Agriculture Technicians, and Geodetic Surveyors.

Try This!

Investigate the geospatial occupations at the U.S. Department of Labor's "O*Net" database. Enter "geospatial" in the search field named "Occupation Quick Search." Follow links to occupation descriptions. Note the estimates for 2008 employment and employment growth through 2018. Also note that, for some anomalous reason, the keyword "geospatial" is not associated with the occupation "Geodetic Surveyor."

Screen capture of Department of Labor's O-Net site

Figure 1.13.1: Screenshot showing quick search results for occupations that include the word Geospatial

Credit: Onetonline.org

Meanwhile, DoL commenced a "competency modeling" initiative for high-growth industries in 2005. Their goal was to help educational institutions like ours meet the demand for qualified technology workers by identifying what workers need to know and be able to do. At DoL, a competency is "the capability to apply or use a set of related knowledge, skills, and abilities required to successfully perform ‘critical work functions’ or tasks in a defined work setting” (Ennis 2008). A competency model is "a collection of competencies that together define successful performance in a particular work setting."

Workforce analysts at DoL began work on a Geospatial Technology Competency Model (GTCM) in 2005. Building on their research, a panel of accomplished practitioners and educators produced a complete draft of the GTCM, which they subsequently revised in response to public comments. Published in June 2010, the GTCM identifies the competencies that characterize successful workers in the geospatial industry. In contrast to GIS&T Body of Knowledge, an academic project meant to define the nature and scope of the field, the GTCM is an industry specification that defines what individual workers and students should aspire to know and learn.

Try This!

Explore the Geospatial Technology Competency Model (GTCM) at the U.S. Department of Labor's Competency Model Clearinghouse. Under "Industry Competency Models," follow the link "Geospatial Technology." There, the pyramid (shown in Figure 1.13.2, below) is an image map which you can click to reveal the various competencies. The complete GTCM is also available as a Word doc and PDF file.

The GTCM specifies several "tiers" of competencies, progressing from general to occupationally specific. Tiers 1 through 3 (the gray and red layers), called Foundation Competencies, specify general workplace behaviors and knowledge that successful workers in most industries exhibit. Tiers 4 and 5 (yellow) include the distinctive technical competencies that characterize a given industry and its three sectors: Positioning and Data Acquisition, Analysis and Modeling, and Programming and Application Development. Above Tier 5 are additional Tiers corresponding to the occupation-specific competencies and requirements that are specified in the occupation descriptions published at O*NET Online and in a Geospatial Management Competency Model that is in development as of January, 2012.

Screen capture of the Department of Labor's GeospatialTechnology Competency Model site explained in attached spreadsheet

Figure 1.13.2: Geospatial Technology Competency Model which is a guideline for what is needed in the workforce.

Credit: Careeronestop.org

One way educational institutions and students can use the GTCM is as a guideline for assessing how well curricula align with workforce needs. The Penn State Online GIS program conducted such an assessment in 2011. Results, appear in the spreadsheet linked below.

Try This!

Open the attached Excel spreadsheet to see how our Penn State Online GIS curricula address workforce needs identified in the GTCM.

The sheet will open on a cover page. At the bottom of the sheet are tabs that correspond to Tiers 1-5 of the GTCM. Click the tabs to view the worksheet associated with the Tier you want to see.

In each Tier worksheet, rows correspond to the GTCM competencies. Columns correspond to the Penn State Online courses included in the assessment. Courses that are required for most students are highlighted in light blue. Course authors and instructors were asked to state what students actually do in relation to each of the GTCM competencies. Use the scroll bar at the bottom right edge of the sheet to reveal more courses.

By studying this spreadsheet, you'll gain insight about how individual courses, and how the Penn State Online curriculum as a whole, relate to geospatial workforce needs. If you're interested in comparing ours to curricula at other institutions, ask if they've conducted a similar assessment. If they haven't, ask why not.

Finally, don't forget that you can preview much of our online courseware through our Open Educational Resouces initiative.

13. Distinguishing Properties of Geographic Data

The claim that geographic information science is a distinct field of study implies that spatial data are somehow special data. Goodchild (1992) points out several distinguishing properties of geographic information. I have paraphrased four such properties below. Understanding them, and their implications for the practice of geographic information science, is a key objective of this text.

Geographic data represent spatial locations and non-spatial attributes measured at certain times.
Geographic space is continuous.
Geographic space is nearly spherical.
Geographic data tend to be spatially dependent.

Let's consider each of these properties next.

14. Locations and Attributes

Geographic data represent spatial locations and non-spatial attributes measured at certain times. Goodchild (1992, p. 33) observes that "a spatial database has dual keys, allowing records to be accessed either by attributes or by locations." Dual keys are not unique to geographic data, but "the spatial key is distinct, as it allows operations to be defined which are not included in standard query languages." In the intervening years, software developers have created variations on SQL that incorporate spatial queries. The dynamic nature of geographic phenomena complicates the issue further, however. The need to pose spatio-temporal queries challenges geographic information scientists (GIScientists) to develop ever more sophisticated ways to represent geographic phenomena, thereby enabling analysts to interrogate their data in ever more sophisticated ways.

15. Continuity

Geographic space is continuous. Although dual keys are not unique to geographic data, one property of the spatial key is. "What distinguishes spatial data is the fact that the spatial key is based on two continuous dimensions" (Goodchild, 1992, p.33). "Continuous" refers to the fact that there are no gaps in the Earth's surface. Canyons, crevasses, and even caverns notwithstanding, there is no position on or near the surface of the Earth that cannot be fixed within some sort of coordinate system grid. Nor is there any theoretical limit to how exactly a position can be specified. Given the precision of modern positioning technologies, the number of unique point positions that could be used to define a geographic entity is practically infinite. Because it's not possible to measure, let alone to store, manage, and process, an infinite amount of data, all geographic data is selective, generalized, approximate. Furthermore, the larger the territory covered by a geographic database, the more generalized the database tends to be.

Image shows the town of Gorham at three different scales, see text below

Figure 1.16.1 Geographic data are generalized according to scale.

Credit: U.S. Geological Survey

For example, the illustration in Figure 1.16.1, above, shows a town called Gorham depicted on three different topographic maps produced by the United States Geological Survey. Gorham occupies a smaller space on the small-scale (1:250,000) map than it does at 1:62,000 or at 1:24,000. But the relative size of the feature isn't the only thing that changes. Notice that the shape of the feature that represents the town changes also. As does the number of features and the amount of detail shown within the town boundary and in the surrounding area. The name for this characteristically parallel decline in map detail and map scale is generalization.

It is important to realize that generalization occurs not only on printed maps, but in digital databases as well. It is possible to represent phenomena with highly detailed features (whether they be made up of high-resolution raster grid cells or very many point locations) in a single scale-independent database. In practice, however, highly detailed databases are not only extremely expensive to create and maintain, but they also bog down information systems when used in analyses of large areas. For this reason, geographic databases are usually created at several scales, with different levels of detail captured for different intended uses.

16. Nearly Spherical

Geographic space is nearly spherical. The fact that the Earth is nearly, but not quite, a sphere poses some surprisingly complex problems for those who wish to specify locations precisely.

World map showing the differences in elevation between a
geoid and a reference ellipsoid.

Figure 1.17.1 Differences in elevation between a geoid model and a reference ellipsoid. Deviations range from a high of 75 meters (colored red, over New Guinea) to a low of 104 meters (colored purple, in the Indian Ocean).

Credit: National Geodetic Survey, n. d.

The geographic coordinate system of latitude and longitude coordinates provides a means to define positions on a sphere. Inaccuracies that are unacceptable for some applications creep in, however, when we confront the Earth's "actual" irregular shape, which is called the geoid. Furthermore, the calculations of angles and distance that surveyors and others need to perform routinely are cumbersome with spherical coordinates.

That consideration, along with the need to depict the Earth on flat pieces of paper, compels us to transform the globe into a plane, and to specify locations in plane coordinates instead of spherical coordinates. The set of mathematical transformations by which spherical locations are converted to locations on a plane--called map projections--all lead inevitably to one or another form of inaccuracy.

All this is trouble enough, but we encounter even more difficulties when we seek to define "vertical" positions (elevations) in addition to "horizontal" positions. Perhaps it goes without saying that an elevation is the height of a location above some datum, such as mean sea level. Unfortunately, to be suitable for precise positioning, a datum must correspond closely with the Earth's actual shape. Which brings us back again to the problem of the geoid.

We will consider these issues in greater depth in Chapter 2. For now, suffice it to say that geographic data are unique in having to represent phenomena that are distributed on a continuous and nearly spherical surface.

17. Spatial Dependency

Geographic data tend to be spatially dependent. Spatial dependence is "the propensity for nearby locations to influence each other and to possess similar attributes" (Goodchild, 1992, p.33). In other words, to paraphrase a famous geographer named Waldo Tobler, while everything is related to everything else, things that are close together tend to be more related than things that are far apart. Terrain elevations, soil types, and surface air temperatures, for instance, are more likely to be similar at points two meters apart than at points two kilometers apart. A statistical measure of the similarity of attributes of point locations is called spatial autocorrelation.

Given that geographic data are expensive to create, spatial dependence turns out to be a very useful property. We can sample attributes at a limited number of locations, then estimate the attributes of intermediate locations. The process of estimating unknown values from nearby known values is called interpolation. Interpolated values are reliable only to the extent that the spatial dependence of the phenomenon can be assumed. If we were unable to assume some degree of spatial dependence, it would be impossible to represent continuous geographic phenomena in digital form.

18. Geographic Data and Geographic Questions

The ultimate objective of all geospatial data and technologies, after all, is to produce knowledge. Most of us are interested in data only to the extent that they can be used to help understand the world around us and to make better decisions. Decision-making processes vary a lot from one organization to another. In general, however, the first steps in making a decision are to articulate the questions that need to be answered and to gather and organize the data needed to answer the questions (Nyerges & Golledge, 1997).

Geographic data and information technologies can be very effective in helping to answer certain kinds of questions. The expensive, long-term investments required to build and sustain GIS infrastructures can be justified only if the questions that confront an organization can be stated in terms that GIS is equipped to answer. As a specialist in the field, you may be expected to advise clients and colleagues on the strengths and weaknesses of GIS as a decision support tool. To follow are examples of the kinds of questions that are amenable to GIS analyses, along with questions that GIS is not so well suited to help answer.

Questions concerning individual geographic entities

The simplest geographic questions pertain to individual entities. Such questions include:

Questions about space

Where is the entity located?
What is its extent?

Questions about attributes

What are the attributes of the entity located there?
Do its attributes match one or more criteria?

Questions about time

When were the entity's location, extent or attributes measured?
Has the entity's location, extent, or attributes changed over time?

Simple questions like these can be answered effectively with a good printed map, of course. GIS becomes increasingly attractive as the number of people asking the questions grows, especially if they lack access to the required paper maps.

Questions concerning multiple geographic entities

Harder questions arise when we consider relationships among two or more entities. For instance, we can ask:

Questions about spatial relationships

Do the entities contain one another?
Do they overlap?
Are they connected?
Are they situated within a certain distance of one another?
What is the best route from one entity to the others?
Where are entities with similar attributes located?

Questions about attribute relationships

Do the entities share attributes that match one or more criteria?
Are the attributes of one entity influenced by changes in another entity?

Questions about temporal relationships

Have the entities' locations, extents, or attributes changed over time?

Geographic data and information technologies are very well suited to answering moderately complex questions like these. GIS is most valuable to large organizations that need to answer such questions often.

Questions that GIS is not particularly good at answering

Harder still, however, are explanatory questions--such as why entities are located where they are, why they have the attributes they do, and why they have changed as they have. In addition, organizations are often concerned with predictive questions--such as what will happen at this location if thus-and-so happens at that location? In general, commercial GIS software packages cannot be expected to provide clear-cut answers to explanatory and predictive questions right out of the box. Typically, analysts must turn to specialized statistical packages and simulation routines. Information produced by these analytical tools may then be re-introduced into the GIS database, if necessary. Research and development efforts intended to more tightly couple analytical software with GIS software are underway within the GIScience community. It is important to keep in mind that decision support tools like GIS are no substitutes for human experience, insight, and judgment.

At the outset of the chapter, I suggested that producing information by analyzing data is something like producing energy by burning coal. In both cases, technology is used to realize the potential value of a raw material. Also, in both cases, the production process yields some undesirable by-products. Similarly, in the process of answering certain geographic questions, GIS tends to raise others, such as:

Given the intrinsic imperfections of the data, how reliable are the results of the GIS analysis?
Does the information produced through GIS analysis tend to systematically benefit some constituent groups at the expense of others?
Should the data used to make the decision be made public?
Does the use of GIS affect the organization's decision-making processes in ways that are beneficial to its management, its employees, and its customers?

As is the case in so many endeavors, the answer to a geographic question usually includes more questions.

Try This!

Can you cite an example of a "hard" question that you and your GIS system have been called upon to address?

19. Summary

It's a truism among specialists in geographic information that the lion's share of the cost of most GIS projects is associated with the development and maintenance of a suitable database. It seems appropriate, therefore, that our first course in geographic information systems should focus upon the properties of geographic data.

I began this first chapter by defining data in a generic sense, as sets of symbols that represent measurements of phenomena. I suggested that data are the raw materials from which information is produced. Information systems, such as database management systems, are technologies that people use to transform data into the information needed to answer questions and to make decisions.

Spatial data are special data. They represent the locations, extents, and attributes of objects and phenomena that make up the Earth's surface at particular times. Geographic data differ from other kinds of data in that they are distributed along a continuous, nearly spherical globe. They also have the unique property that the closer two entities are located, the more likely they are to share similar attributes.

GIS is a special kind of information system that combines the capabilities of database management systems with those of mapping systems. GIS is one object of study of the loosely-knit, multidisciplinary field called Geographic Information Science and Technology. GIS is also a profession--one of several that make up the geospatial industry. As Yogi Berra said, "In theory, there's no difference between theory and practice. In practice there is." In the chapters and projects that follow, we'll investigate the nature of geographic information from both conceptual and practical points of view.

20. Bibliography

Carstensen, L. W. (1986). Regional land information systems development using relational databases and geographic information systems. Proceedings of the AutoCarto, London, 507-516.

City of Ontario, California. (n.d.). Geographic information web server. Retrieved on July 6, 1999, from http://www.ci.ontario.ca.us/gis/index.asp (since retired).

Cowen, D. J. (1988). GIS versus CAD versus DBMS: What are the differences? Photogrammetric Engineering and Remote Sensing 54:11, 1551-1555.

DiBiase, D. and twelve others (2010). The New Geospatial Technology Competency Model: Bringing workforce needs into focus. URISA Journal 22:2, 55-72.

DiBiase, D, M. DeMers, A. Johnson, K. Kemp, A. Luck, B. Plewe, and E. Wentz (2007). Introducing the First Edition of the GIS&T Body of Knowledge. Cartography and Geographic Information Science, 34(2), pp. 113-120. U.S. National Report to the International Cartographic Association.

DiBiase, D, M. DeMers, A. Johnson, K. Kemp, A. Luck, B. Plewe, and E. Wentz (2006). Geographic Information Science and Technology Body of Knowledge. Association of American Geographers.

Ennis, M. R. (2008). Competency models: A review of the literature and the role of the employment and training administration (ETA). http://www.careeronestop.org/COMPETENCYMODEL/info_documents/OPDRLiteratureReview.pdf.

GITA and AAG (2006). Defining and communicating geospatial industry workforce demand: Phase I report.

Goodchild, M. (1992). Geographical information science. International Journal of Geographic Information Systems 6:1, 31-45.

Goodchild, M. (1995). GIS and geographic research. In J. Pickles (Ed.), Ground truth: the social implications of geographic information systems (pp. of chapter). New York: Guilford.

National Decision Systems. A zip code can make your company lots of money! Retrieved on July 6, 1999, from http://laguna.natdecsys.com/lifequiz (since retired).

National Geodetic Survey. (1997). Image generated from 15'x15' geoid undulations covering the planet Earth. Retrieved 1999, from http://www.ngs.noaa.gov/GEOID/geo-index.html (since retired).

Nyerges, T. L. & Golledge, R. G. (n.d.) NCGIA core curriculum in GIS, National Center for Geographic Information and Analysis, University of California, Santa Barbara, Unit 007. Retrieved November 12, 1997, from http://www.ncgia.ucsb.edu/giscc/units/u007/u007.html

United States Department of the Interior Geological Survey. (1977). [map]. 1:24 000. 7.5 minute series. Washington, D.C.: USDI.

United States Geologic Survey. "Bellefonte, PA Quadrangle" (1971). [map]. 1:24 000. 7.5 minute series. Washington, D.C.:USGS.

University Consortium for Geographic Information Science. Retrieved April 26, 2006, from http://www.ucgis.org

Wilson, J. D. (2001). Attention data providers: A billion-dollar application awaits. GEOWorld, February, 54.

Worboys, M. F. (1995). GIS: A computing perspective. London: Taylor and Francis.

Chapter 2: Scales and Transformations

1. Overview

Chapter 1 outlined several of the distinguishing properties of geographic data. One is that geographic data are necessarily generalized, and that generalization tends to vary with scale. A second distinguishing property is that the Earth's complex, nearly-spherical shape complicates efforts to specify exact positions on Earth's surface. This chapter explores implications of these properties by illuminating concepts of scale, Earth geometry, coordinate systems, the "horizontal datums" that define the relationship between coordinate systems and the Earth's shape, and the various methods for transforming coordinate data between 3D and 2D grids, and from one datum to another.

Compared to Chapter 1, Chapter 2 may seem long, technical, and abstract, particularly to those for whom these concepts are new.

Objectives

Students who successfully complete Chapter 2 should be able to:

demonstrate your ability to specify geospatial locations using geographic coordinates;
convert geographic coordinates between two different formats;
explain the concept of a horizontal datum;
calculate the change in a coordinate location due to a change from one horizontal datum to another;
estimate the magnitude of "datum shift" associated with the adjustment from NAD 27 to NAD 83;
recognize the kind of transformation that is appropriate to georegister two or more data sets;
describe the characteristics of the UTM coordinate system, including its basis in the Transverse Mercator map projection;
plot UTM coordinates on a map;
describe the characteristics of the SPC system, including map projection on which it is based;
convert geographic coordinates to SPC coordinates;
interpret distortion diagrams to identify geometric properties of the sphere that are preserved by a particular projection; and
classify projected graticules by projection family.

"Try This!" Activities

2. Scale

You hear the word "scale" often when you work around people who produce or use geographic information. If you listen closely, you'll notice that the term has several different meanings, depending on the context in which it is used. You'll hear talk about the scales of geographic phenomena and about the scales at which phenomena are represented on maps and aerial imagery. You may even hear the word used as a verb, as in "scaling a map" or "downscaling." The goal of this section is to help you learn to tell these different meanings apart, and to be able to use concepts of scale to help make sense of geographic data.

Specifically, in this part of Chapter 2 you will learn to:

calculate map scale using representative fractions;
describe the general relationship between map scale, detail, and accuracy.

3. Scale as Scope

Often "scale" is used as a synonym for "scope" or "extent." For example, the title of an international research project called The Large Scale Biosphere-Atmosphere Experiment in Amazonia (1999) uses the term "large scale" to describe a comprehensive study of environmental systems operating across a large region. This usage is common not only among environmental scientists and activists, but also among economists, politicians, and the press. Those of us who specialize in geographic information usually use the word "scale" differently, however.

4. Map and Photo Scale

When people who work with maps and aerial images use the word "scale," they usually are talking about the sizes of things that appear on a map or air photo, relative to the actual sizes of those things on the ground.

Map scale is the proportion between a distance on a map and a corresponding distance on the ground:

(D_m / D_g).

By convention, the proportion is expressed as a "representative fraction" in which map distance (D_m) is reduced to 1. The proportion, or ratio, is also typically expressed in the form 1 : D_g rather than 1 / D_g.

The representative fraction 1:100,000, for example, means that a section of road that measures 1 unit in length on a map stands for a section of road on the ground that is 100,000 units long.

If we were to change the scale of the map such that the length of the section of road on the map was reduced to, say, 0.1 units in length, we would have created a smaller-scale map whose representative fraction is 0.1:100,000, or 1:1,000,000. When we talk about large- and small-scale maps and geographic data, then, we are talking about the relative sizes and levels of detail of the features represented in the data. In general, the larger the map scale, the more detail is shown. This tendency is illustrated below in Figure 2.5.1.

Image of three maps and photo scale of the town of Gorham

Figure 2.5.1 Geographic data are generalized according to scale.

Credit: Adapted from Thompson, 1988.

One of the defining characteristics of topographic maps is that scale is consistent across each map and within each map series. This isn't true for aerial imagery, however, except for images that have been orthorectified. As discussed in Chapter 6, large scale maps are typically derived from aerial imagery. One of the challenges associated with using air photos as sources of map data is that the scale of an aerial image varies from place to place as a function of the elevation of the terrain shown in the scene. Assuming that the aircraft carrying the camera maintains a constant flying height (which pilots of such aircraft try very hard to do), the distance between the camera and the ground varies along each flight path. This causes air photo scale to be larger where the terrain is higher and smaller where the terrain is lower. An "orthorectified" image is one in which variations in scale caused by variations in terrain elevation (among other effects) have been removed.

You can calculate the average scale of an unrectified air photo by solving the equation S_p = f / (H-h_avg), where f is the focal length of the camera, H is the flying height of the aircraft above mean sea level, and h_avg is the average elevation of the terrain. You can also calculate air photo scale at a particular point by solving the equation S_p = f / (H-h), where f is the focal length of the camera, H is the flying height of the aircraft above mean sea level, and h is the elevation of the terrain at a given point.

5. Graphic Map Scales

Another way to express map scale is with a graphic (or "bar") scale. Unlike representative fractions, graphic scales remain true when maps are shrunk or magnified.

Example of a bar scale and a variable scale (has curvature) side by side

Figure 2.6.1 Graphic Scales

If they include a scale at all, most maps include a bar scale like the one shown above left (Figure 2.6.1). Some also express map scale as a representative fraction. Either way, the implication is that scale is uniform across the map. In fact, except for maps that show only very small areas, scale varies across every map. As you probably know, this follows from the fact that positions on the nearly-spherical Earth must be transformed to positions on two-dimensional sheets of paper. Systematic transformations of this kind are called map projections. As we will discuss in greater depth later in this chapter, all map projections are accompanied by deformation of features in some or all areas of the map. This deformation causes map scale to vary across the map. Representative fractions may, therefore, specify map scale along a line at which deformation is minimal (nominal scale). Bar scales denote only the nominal or average map scale. Variable scales, like the one illustrated above right, show how scale varies, in this case by latitude, due to deformation caused by map projection.

6. Map Scale and Accuracy

One of the special characteristics of geographic data is that phenomena shown on maps tend to be represented differently at different scales. Typically, as scale decreases, so too does the number of different features and the detail with which they are represented. Not only printed maps, but also digital geographic data sets that cover extensive areas, tend to be more generalized than datasets that cover limited areas.

Accuracy also tends to vary in proportion with map scale. The United States Geological Survey, for example, guarantees that the mapped positions of 90 percent of well-defined points shown on its topographic map series at scales smaller than 1:20,000 will be within 0.02 inches of their actual positions on the map (see the National Geospatial Program Standards and Specifications). Notice that this "National Map Accuracy Standard" is scale-dependent. The allowable error of well-defined points (such as control points, road intersections, and such) on 1:250,000 scale topographic maps is thus 1 / 250,000 = 0.02 inches / D_g or D_g = 0.02 inches x 250,000 = 5,000 inches or 416.67 feet. Neither small-scale maps nor the digital data derived from them are reliable sources of detailed geographic information.

Stage three composite disqualification map of Pennsylvania. Majority grey

Figure 2.7.1 Areas (in gray) disqualified as potential sites for a low-level radioactive waste storage facility depicted on a small scale map (original 1:1,500,000) mask small suitable areas large enough to contain the 500-acre facility.

Credit: Chem-Nuclear Systems, Inc., 1994

Sometimes the detail lost on small-scale maps causes serious problems. For example, a contractor hired to use GIS to find a suitable site for a low-level radioactive waste storage facility in Pennsylvania presented a series of 1:1,500,000 scale maps at public hearings around the state in the early 1990s. The scale was chosen so that disqualified areas of the entire state could be printed on a single 11 x 17-inch page. A report accompanying the map included the disclaimer that "it is possible that small areas of sufficient size for the LLRW disposal facility site may exist within regions that appear disqualified on the [map]. The detailed information for these small areas is retained within the GIS even though they are not visually illustrated..." (Chem-Nuclear Systems, Inc. 1993, p. 20). Unfortunately for the contractor, alert citizens recognized the shortcomings of the small-scale map, and newspapers published reports accusing the out-of-state company of providing inaccurate documents. Subsequent maps were produced at a scale large enough to discern 500-acre suitable areas.

7. Scale as a Verb

The term "scale" is sometimes used as a verb. To scale a map is to reproduce it at a different size. For instance, if you photographically reduce a 1:100,000-scale map to 50 percent of its original width and height, the result would be one-quarter the area of the original. Obviously, the map scale of the reduction would be smaller too: 1/2 x 1/100,000 = 1/200,000.

Because of the inaccuracies inherent in all geographic data, particularly in small scale maps, scrupulous geographic information specialists avoid enlarging source maps. To do so is to exaggerate generalizations and errors. The original map used to illustrate areas in Pennsylvania disqualified from consideration for low-level radioactive waste storage shown on an earlier page, for instance, was printed with the statement "Because of map scale and printing considerations, it is not appropriate to enlarge or otherwise enhance the features on this map."

8. Geospatial Measurement Scales

The word "scale" can also be used as a synonym for a ruler--a measurement scale. Because data consist of symbols that represent measurements of phenomena, it's important to understand the reference systems used to take the measurements in the first place. In this section, we'll consider a measurement scale known as the geographic coordinate system that is used to specify positions on the Earth's roughly spherical surface. In other sections, we'll encounter two-dimensional (plane) coordinate systems, as well as the measurement scales used to specify attribute data.

In this section of Chapter 2, you will:

demonstrate your ability to specify geospatial locations using geographic coordinates;
convert geographic coordinates between two different formats.

9. Coordinate Systems

An example showing how the Cartesian coordinate system works. labeled X and Y axis

Figure 2.10.1 A Cartesian coordinate system.

As you probably know, locations on the Earth's surface are measured and represented in terms of coordinates. A coordinate is a set of two or more numbers that specifies the position of a point, line, or other geometric figure in relation to some reference system. The simplest system of this kind is a Cartesian coordinate system (named for the 17th century mathematician and philosopher René Descartes). A Cartesian coordinate system is simply a grid formed by juxtaposing two measurement scales, one horizontal (x) and one vertical (y). The point at which both x and y equal zero is called the origin of the coordinate system. In Figure 2.10.1, above, the origin (0,0) is located at the center of the grid. All other positions are specified relative to the origin. The coordinate of upper right-hand corner of the grid is (6,3). The lower left-hand corner is (-6,-3). If this is not clear, please ask for clarification!

Cartesian and other two-dimensional (plane) coordinate systems are handy due to their simplicity. For obvious reasons, they are not perfectly suited to specifying geospatial positions, however. The geographic coordinate system is designed specifically to define positions on the Earth's roughly-spherical surface. Instead of the two linear measurement scales, x and y, the geographic coordinate systems juxtaposes two curved measurement scales. The east-west scale, called longitude (conventionally designated by the Greek symbol lambda), ranges from +180° to -180°. Because the Earth is round, +180° (or 180° E) and -180° (or 180° W) are the same grid line. That grid line is roughly the International Date Line, which has diversions that pass around some territories and island groups. Opposite the International Date Line is the prime meridian, the line of longitude defined by international treaty as 0°. The north-south scale, called latitude (designated by the Greek symbol phi), ranges from +90° (or 90° N) at the North pole to -90° (or 90° S) at the South pole. We'll take a closer look at the geographic coordinate system next.

An example showing how the Geodetic coordinate system works

Figure 2.10.2 The geographic (or "geodetic") coordinate system.

10. Geographic Coordinate System

Picture showing how longitude and latitude fall on a globe.

Figure 2.11.1 The geographic coordinate system.

Credit: David DiBiase

Longitude specifies positions east and west as the angle between the prime meridian and a second meridian that intersects the point of interest. Longitude ranges from +180 (or 180° E) to -180° (or 180° W). 180° East and West longitude together form the International Date Line.

Latitude specifies positions north and south in terms of the angle subtended at the center of the Earth between two imaginary lines, one that intersects the equator and another that intersects the point of interest. Latitude ranges from +90° (or 90° N) at the North pole to -90° (or 90° S) at the South pole. A line of latitude is also known as a parallel.

At higher latitudes, the length of parallels decreases to zero at 90° North and South. Lines of longitude are not parallel but converge toward the poles. Thus, while a degree of longitude at the equator is equal to a distance of about 111 kilometers, that distance decreases to zero at the poles.

11. Geographic Coordinate Formats

Geographic coordinates may be expressed in decimal degrees, or in degrees, minutes, and seconds. Sometimes, you need to convert from one form to another. Steve Kiouttis (personal communication, Spring 2002), manager of the Pennsylvania Urban Search and Rescue Program, described one such situation on the course Bulletin Board: "I happened to be in the state Emergency Operations Center in Harrisburg on Wednesday evening when a call came in from the Air Force Rescue Coordination Center in Dover, DE. They had an emergency locator transmitter (ELT) activation and requested the PA Civil Air Patrol to investigate. The coordinates given to the watch officer were 39 52.5 n and -75 15.5 w. This was plotted incorrectly (treated as if the coordinates were in decimal degrees 39.525n and -75.155 w) and the location appeared to be near Vineland, New Jersey. I realized that it should have been interpreted as 39 degrees 52 minutes and 5 seconds n and -75 degrees and 15 minutes and 5 seconds w) and made the conversion (as we were taught in Chapter 2) and came up with a location on the grounds of Philadelphia International Airport, which is where the locator was found, in a parked airliner."

Here's how it works:

To convert -89.40062 from decimal degrees to degrees, minutes, seconds:

Subtract the number of whole degrees (89°) from the total (89.40062°). (The minus sign is used in the decimal degree format only to indicate that the value is a west longitude or a south latitude.)
Multiply the remainder by 60 minutes (.40062 x 60 = 24.0372).
Subtract the number of whole minutes (24') from the product.
Multiply the remainder by 60 seconds (.0372 x 60 = 2.232).
The result, expressed in the correct number of significant figures, is 89° 24' 2.2" W or S.

To convert 43° 4' 31" from degrees, minutes, seconds to decimal degrees:
DD = Degrees + (Minutes/60) + (Seconds/3600)

Divide the number of seconds by 60 (31 ÷ 60 = 0.5166).
Add the quotient of step (1) to the whole number of minutes (4 + 0.5166).
Divide the result of step (2) by 60 (4.5166 ÷ 60 = 0.0753).
Add the quotient of step (3) to the number of whole number degrees (43 + 0.0753).
The result, expressed in the correct number of significant figures, is 43.075°

12. Horizontal Datums

Geographic data represent the locations and attributes of things on the Earth's surface. Locations are measured and encoded in terms of geographic coordinates (i.e., latitude and longitude) or plane coordinates (e.g., UTM). To measure and specify coordinates accurately, one first must define the geometry of the surface itself. To see what I mean, imagine a soccer ball. If you or your kids play soccer you can probably conjure up a vision of a round mosaic of 20 hexagonal (six sided) and 12 pentagonal (five sided) panels (soccer balls come in many different designs, but the 32-panel ball is used in most professional matches. Visit Soccer Ball World for more than you ever wanted to know about soccer balls). Now focus on one point at an intersection of three panels. You could use spherical (e.g., geographic) coordinates to specify the position of that point. But if you deflate the ball, the position of the point in space changes, and so must its coordinates. The absolute (though not the relative) position of a point on a surface, then, depends upon the shape of the surface.

Every position is determined in relation to at least one other position. Coordinates, for example, are defined relative to the origin of the coordinate system grid. A land surveyor measures the "corners" of a property boundary relative to a previously-surveyed control point. Surveyors and engineers measure elevations at construction sites and elsewhere. Elevations are expressed in relation to a vertical datum, a reference surface such as mean sea level. As you probably know, there is also such a thing as a horizontal datum, although this is harder to explain and to visualize than the vertical case. Horizontal datums define the geometric relationship between a coordinate system grid and the Earth's surface. Because the Earth's shape is complex, the relationship is too. The goal of this section is to explain the relationship.

Specifically, in this section of Chapter 2 you will learn to:

explain the concept of a horizontal datum;
calculate the change in a coordinate location due to a change from one horizontal datum to another;
estimate the magnitude of "datum shift" associated with the adjustment from NAD 27 to NAD 83.

13. Geoids

Diagram of a Geoid showing the equator and the poles. Not quite a sphere, uneven

Figure 2.14.1 The Earth's shape is defined as a surface that closely approximates global mean sea level, but across which gravity is everywhere equal. The caricature of the geoid shown above is not drawn to scale. Irregularities are greatly exaggerated.

Credit: Adapted from Smith, 1988

The accuracy of coordinates that specify geographic locations depends upon how the coordinate system grid is aligned with the Earth's surface. Unfortunately for those who need accurate geographic data, defining the shape of the Earth's surface is a non-trivial problem. So complex is the problem that an entire profession, called geodesy, has arisen to deal with it.

Geodesists define the Earth's surface as a surface that closely approximates global mean sea level, but across which gravity is everywhere equal. They refer to this shape as the geoid. Geoids are lumpy because gravity varies from place to place in response to local differences in terrain and variations in the density of materials in the Earth's interior. Geoids are also a little squat. Sea level gravity at the poles is greater than sea level gravity at the equator, a consequence of Earth's "oblate" shape as well as the centrifugal force associated with its rotation.

Geodesists at the U.S. National Geodetic Survey describe the geoid as an "equipotential surface" because the potential energy associated with the Earth's gravitational pull is equivalent everywhere on the surface. Like fitting a trend line through a cluster of data points, the geoid is a three-dimensional statistical surface that fits as closely as possible gravity measurements taken at millions of locations around the world. As additional and more accurate gravity measurements become available, geodesists revise the geoid periodically. Some geoid models are solved only for limited areas; GEOID03, for instance, is calculated only for the continental U.S.

Recall that horizontal datums define how coordinate system grids align with the Earth's surface. Long before geodesists calculated geoids, surveyors used much simpler surrogates called ellipsoids to model the shape of the Earth.

14. Ellipsoids

Diagram of a geoid with a reference ellipsoid overlay

Figure 2.15.1 Ellipsoids approximate the geoid.

Credit: Adapted from Smith, 1988

An ellipsoid is a three-dimensional geometric figure that resembles a sphere, but whose equatorial axis (a in Figure 2.15.1, above) is slightly longer than its polar axis (b). The equatorial axis of the World Geodetic System of 1984, for instance, is approximately 22 kilometers longer than the polar axis, a proportion that closely resembles the oblate spheroid that is planet Earth. Ellipsoids are commonly used as surrogates for geoids so as to simplify the mathematics involved in relating a coordinate system grid with a model of the Earth's shape. Ellipsoids are good, but not perfect, approximations of geoids. The map in Figure 2.15.2, below shows differences in elevation between a geoid model called GEOID96 and the WGS84 ellipsoid. The surface of GEOID96 rises up to 75 meters above the WGS84 ellipsoid over New Guinea (where the map is colored red). In the Indian Ocean (where the map is colored purple), the surface of GEOID96 falls about 104 meters below the ellipsoid surface.

Map of differences in elevation between geoid model and ellipsoid model, color coded and explained above

Figure 2.15.2 Deviations between an ellipsoid and a geoid.

Credit: National Geodetic Survey, 1997

Many ellipsoids are in use around the world. (Wikipedia presents a list in its entry on Earth Ellipsoids) Local ellipsoids minimize differences between the geoid and the ellipsoid for individual countries or continents. The Clarke 1866 ellipsoid, for example, minimizes deviations in North America. The North American Datum of 1927 (NAD 27) associates the geographic coordinate grid with the Clarke 1866 ellipsoid. NAD 27 involved an adjustment of the latitude and longitude coordinates of some 25,000 geodetic control point locations across the U.S. The nationwide adjustment commenced from an initial control point at Meades Ranch, Kansas, and was meant to reconcile discrepancies among the many local and regional control surveys that preceded it.

The North American Datum of 1983 (NAD 83) involved another nationwide adjustment, necessitated in part by the adoption of a new ellipsoid, called GRS 80. Unlike Clarke 1866, GRS 80 is a global ellipsoid centered upon the Earth's center of mass. GRS 80 is essentially equivalent to WGS 84, the global ellipsoid upon which the Global Positioning System is based. NAD 27 and NAD 83 both align coordinate system grids with ellipsoids. They differ simply in that they refer to different ellipsoids. Because Clarke 1866 and GRS 80 differ slightly in shape as well as in the positions of their center points, the adjustment from NAD 27 to NAD 83 involved a shift in the geographic coordinate grid. Because a variety of datums remain in use, geospatial professionals need to understand this shift, as well as how to transform data between horizontal datums.

The preceding statement remains true despite the fact that NAD 83 will soon be discontinued as part of the National Geodetic Survey's ongoing modernization of the U.S. National Spatial Reference System. The switch from a "passive" ellipsoid-based reference system to a GPS-based dynamic system was planned for 2022, but it has since been delayed until 2024 or -25. Visit the National Geodetic Survey for the latest information.

15. Control Points and Datum Shifts

Photograph of Horizontal control point monument

Figure 2.16.1 In the U.S., high-order horizontal control point locations are marked with permanent metal "monuments" like the one shown above. The physical manifestation of datum is a network of control point measurements.

Credit: National Geodetic Survey, 2004

Geoids, ellipsoids, and even coordinate systems are all abstractions. The fact that "horizontal datum" refers to a relationship between an ellipsoid and a coordinate system, two abstractions, may explain why the concept is so frequently misunderstood. Datums do have physical manifestations, however.

Shown above (Figure 2.16.1) is one of the approximately two million horizontal and vertical control points that have been established in the U.S. Although control point markers are fixed, the coordinates that specify their locations are liable to change. The U.S. National Geodetic Survey maintains a database of the coordinate specifications of these control points, including historical locations as well as more recent adjustments. One occasion for adjusting control point coordinates is when new horizontal datums are adopted. Since every coordinate system grid is aligned with an ellipsoid that approximates the Earth's shape, coordinate grids necessarily shift when one ellipsoid is replaced by another. When coordinate system grids shift, the coordinates associated with fixed control points need to be adjusted. How we account for the Earth's shape makes a difference in how we specify locations.

Try This!

Here's a chance to calculate how much the coordinates of a control point change in response to an adjustment from North American Datum of 1927 (based on the Clarke 1866 ellipsoid) to the North American Datum of 1983 (based upon the GRS 80 ellipsoid).

Find the geographic coordinates of a populated place
1. Start at the USGS' Geographic Names Information System at the U.S. Board on Geographic Names.
2. Follow the links labeled Domestic Names, then Search to search place names included in the Geographic Names Information System.
3. At the Query Form, enter the name of your home town (or other named geographic feature) in the Feature Name field, as well as your home State. Choose Populated Place (or other, as appropriate) for the Feature Class.
  - If your home is somewhere other than the U.S., enter a place name of interest or fantasy destination (e.g., "Las Vegas" ;-) ).
4. Click Send Query.
5. The result should include latitude and longitude coordinates of a centroid that represents where the name your town (or other feature) would appear on a map. You'll need those coordinates for the next step.
Find the geographic coordinates of a nearby horizontal control point
1. Visit the U.S. National Geodetic Survey home page.
2. Follow the link labeled Survey Mark Datasheets.
3. At the NGS Datasheet page, follow the link labeled Datasheets.
  - You may wish to begin with the "Info Link" labeled "Tell me more about datasheets."
4. At the NGS Datasheet Retrieval page, follow the link labeled Radial Search. (You're welcome to experiment with another retrieval method if you wish.)
5. At the NGS Datasheet Point Radius form:
  - Enter the latitude and longitude coordinates you looked up in step #1. Pay attention to the input format.
  - Specify a Search Radius.
  - Select Any Horz. and/or Vert. Control from the Data Type Desired scrolling field.
  - Select Any Stability from the Stability Desired scrolling field.
  - Click Submit.
6. The result should be a Station List Results form that looks like the contents of the window pictured below. These are the results of my search on the centroid coordinates for State College PA. Note that I have highlighted the station that is both nearest to the coordinates I entered and a first-order control point (see the "1" under the column labeled "H"?)
  
  Figure 2.16.2 Station List Results
  
  Credit: noaa.gov
7. Select the station nearest to the coordinates you specified that is also the highest-order horizontal control point.
8. Click Get Datasheets. The system should respond with a station datasheet like this example.
9. In the example linked above, the CURRENT SURVEY CONTROL of the station point is listed as NAD 83(1992) 40 48 13.83840(N) 077 51 44.25410(W) ADJUSTED. These are the geographic coordinates of the control point relative to the NAD 83 horizontal datum. In the next step, we'll see how much the control point "moved" as a result of the adjustment of those coordinates from the earlier NAD 27 datum. (The geographic coordinates of the control point are specified to 100,000th of a second precision, or approximately 0.3 mm of longitude. Keep in mind, however, the difference between precision and accuracy; the trailing 0 suggests that the accuracy is an order of magnitude less than the precision.)
Calculate the datum shift associated with a conversion from one horizontal datum to another
1. Return to the U.S. National Geodetic Survey home page.
2. Follow the link labeled geodetic tool kit.
3. At the NGS Geodetic Tool Kit page, follow the link labeled NADCON (you'll be taken to an explanatory page, where you'll need to click NADCON again to proceed to the utility).
4. At the North American Datum Conversion Utility page, read the introductory paragraphs.
5. At the NADCON computations form, under the heading compute a datum shift for a specific location:
  - Select direction of conversion: NAD 83 to NAD27.
  - Enter the NAD 83 latitude and longitude coordinates of your control station. Pay attention to format.
  - Click Compute Datum Shift for a Single Location.
6. The result should be a NADCON Output report like this example. In the State College example, the adjustment from NAD 83 to NAD 27 (associated with the replacement of the old Clarke 1866 ellipsoid by the Earth-centered GRS 80 ellipsoid) caused the geographic coordinate system grid to shift nearly 7 meters South and over 23 meters West. That grid shift is reflected in the adjustment of the coordinates that specify the control point's location. Note that the point didn't move, rather, the grid shifted. How much shift occurred at your location?

16. Coordinate Transformations

GIS specialists often need to transform data from one coordinate system and/or datum to another. For example, digital data produced by tracing paper maps over a digitizing tablet need to be transformed from the tablet's non-georeferenced plane coordinate system into a georeferenced plane or spherical coordinate system that can be georegistered with other digital data "layers." Raw image data produced by scanning the Earth's surface from space tend to be skewed geometrically as a result of satellite orbits and other factors; to be useful these too need to be transformed into georeferenced coordinate systems. Even the point data produced by GPS receivers, which are measured as latitude and longitude coordinates based upon the WGS84 datum, often need to be transformed to other coordinate systems or datums to match project specifications. This section describes three categories of coordinate transformations: (1) plane coordinate transformations; (2) datum transformations; and (3) map projections.

Students who successfully complete this section of Chapter 2 should be able to:
recognize the kind of transformation that is appropriate to georegister two or more data sets.

17. Plane Coordinate Transformations

Some coordinate transformations are simple. For example, the transformation from non-georeferenced plane coordinates to non-georeferenced polar coordinates shown in Figure 2.18.1, below, involves nothing more than the replacement of one kind of coordinates with another.

A point on a Cartesian coordinate system (left). Same point on a Polar coordinate system (right)

Figure 2.18.1 The same position specified within two non-georeferenced plane coordinate systems: Cartesian (left) and polar (right).

Credit: Adapted from Iliffe, 2000

Unfortunately, most plane coordinate transformation problems are not so simple. The geometries of non-georeferenced plane coordinate systems and georeferenced plane coordinate systems tend to be quite different, mainly because georeferenced plane coordinate systems are often projected. As you know, the act of projecting a nearly-spherical surface onto a two-dimensional plane necessarily distorts the geometry of the original spherical surface. Specifically, the scale of a projected map (or an unrectified aerial photograph, for that matter) varies from place to place. So long as the geographic area of interest is not too large, however, formulae like the ones described here can be effective in transforming a non-georeferenced plane coordinate system grid to match a georeferenced plane coordinate system grid with reasonable, and measurable, accuracy. We won't go into the math of the transformations here, since the formulae are implemented within GIS software. Instead, this section aims to familiarize you with how some common transformations work and how they may be used.

Similarity Transformation

In the hypothetical illustration below (Figure 2.18.2), the spatial arrangement of six control points digitized from a paper map ("before") are shown to differ from the spatial arrangement of the same points that appear in a georeferenced aerial photograph that is referenced to a different plane coordinate system grid ("after"). If, as shown, the arrangement of the two sets of points differs only in scale, rotation, and offset, a relatively simple four-parameter similarity transformation may do the trick. Your GIS software should derive the parameters for you by comparing the relative positions of the common points. Note that while only six control points are illustrated, ten to twenty control points are recommended (Chrisman 2002).

Figure 2.18.2 Six control point locations before and after a similarity transformation used to correct systematic differences in scale, rotation, and offset between two plane coordinate systems.

Affine Transformation

Sometimes a similarity transformation doesn't do the trick. For example, because paper maps expand and contract more along the paper grain than across the grain in response to changes in humidity, the scale of a paper map is likely to be slightly greater along one axis than the other. In such cases, a six-parameter affine transformation may be used to accommodate differences in scale, rotation, and offset along each of the two dimensions of the source and target coordinate systems. This characteristic is particularly useful for transforming image data scanned from polar-orbiting satellites whose orbits trace S-shaped paths over the rotating Earth.

Figure 2.18.3 Six control point locations before and after an affine transformation used to correct systematic differences in scale, rotation, and offset between two plane coordinate systems. Notice that the arrangement of points before the transformation is skewed as well as offset and rotated.

Second-Order Polynomial Transformation

When neither similarity nor affine transformations yield acceptable results, you may have to resort to a twelve-parameter Second-order polynomial transformation. Their advantage is the potential to correct data sets that are distorted in several ways at once. A disadvantage is that the stability of the results depend very much upon the quantity and arrangement of control points and the degree of dissimilarity of the source and target geometries (Iliffe 2000).

Diagram of a Polynomial Transformation, see caption

Figure 2.18.4 Six control point locations before and after a second-order polynomial transformation. Notice that the arrangement of points before the transformation is distorted in multiple ways in comparison with the corrected arrangement.

Even more elaborate plane transformation methods, known collectively as rubber sheeting, optimize the fit of a source data set to the geometry of a target data set as if the source data were mapped onto a stretchable sheet.

Root Mean Square Error

GIS software provides a statistical measure of how well a set of transformed control points match the positions of the same points in a target data set. Put simply, Root Mean Square (RMS) Error is the average of the distances (also known as residuals) between each pair of control points. What constitutes an acceptably low RMS Error depends on the nature of the project and the scale of analysis.

18. Datum Transformations

Point locations are specified in terms of (a) their positions relative to some coordinate system grid and (b) their heights above or below some reference surface. Obviously, the elevation of a stationary point depends upon the size and shape of the reference surface (e.g., mean sea level) upon which the elevation measurement is based. In the same way, a point's position in a coordinate system grid depends on the size and shape of the surface upon which the grid is draped. The relationship between a grid and a model of the Earth's surface is called a horizontal datum. GIS specialists who are called upon to merge data sets produced at different times and in different parts of the world need to be knowledgeable about datum transformations.

NAD 27 to NAD 83

In the U.S., the two most frequently encountered horizontal datums are the North American Datum of 1927 (NAD 27) and the North American Datum of 1983 (NAD 83). The advent of the Global Positioning System necessitated an update of NAD 27 that included (a) adoption of a geocentric ellipsoid, GRS 80, in place of the Clarke 1866 ellipsoid; and (b) correction of many distortions that had accrued in the older datum. Bearing in mind that the realization of a datum is a network of fixed control point locations that have been specified in relation to the same reference surface, the 1983 adjustment of the North American Datum caused the coordinate values of every control point managed by the National Geodetic Survey (NGS) to change. Obviously, the points themselves did not shift on account of the datum transformation (although they did move a centimeter or more a year due to plate tectonics). Rather, the coordinate system grids based upon the datum shifted in relation to the new ellipsoid. And because local distortions were adjusted at the same time, the magnitude of grid shift varies from place to place. The illustrations below compare the magnitude of the grid shifts associated with the NAD 83 adjustment at one location and nationwide.

Bottom left corner of a topographic quadrangle map of State College

Figure 2.19.1 A corner of the 1:24,000 scale topographic quadrangle map for State College PA showing the magnitude of grid shift associated with the NAD 83 adjustment. The map is based on NAD 27, but was reprinted with revisions in 1987, including the statement that coordinate system grid lines shift 24 meters west and 5 meters south if NAD 83 coordinates are used instead of NAD 27.

Figure 2.19.2 Magnitude of grid shift associated with the NAD 83 adjustment for the continental 48 U.S. states. Shifts range from 10 to 100 meters in the lower 48 (least in upper Midwest states) to over 200 meters in Alaska, and over 400 meters in Hawaii.

Credit: Dewhurst 1990

Given the irregularity of the shift, NGS could not suggest a simple transformation algorithm that surveyors and mappers could use to adjust local data based upon the older datum. Instead, NGS created a software program called NADCON (Dewhurst 1990, Mulcare 2004) that calculates adjusted coordinates from user-specified input coordinates by interpolation from a pair of 15° correction grids generated by NGS from hundreds of thousands of previously-adjusted control points.

GPS Data and WGS 84

The U.S. Department of Defense created the Global Positioning System (GPS) over a period of 16 years at a startup cost of about $10 billion. GPS receivers calculate their positions in terms of latitude, longitude, and height above or below the World Geodetic System of 1984 ellipsoid (WGS 84). Developed specifically for the Global Position System, WGS 84 is an Earth-centered ellipsoid which, unlike the many regional, national, and local ellipsoids still in use, minimizes deviations from the geoid worldwide. Depending on where a GIS specialist may be working, or what data he or she may need to work with, the need to transform GPS data from WGS 84 to some other datum is likely to arise. Datum transformation algorithms are implemented within GIS software as well as in the post-processing software provided by GPS vendors for use with their receivers. Some transformation algorithms yield more accurate results than others. The method you choose will depend on what choices are available to you and how much accuracy your application requires.

Unlike the plane transformations described earlier, datum transformations involve ellipsoids and are therefore three-dimensional. The simplest is the three-parameter Molodenski transformation. In addition to knowledge of the size and shape of the source and target ellipsoids (specified in terms of semimajor axis, the distance from the ellipsoid's equator to its center, and flattening ratio, the degree to which the ellipsoid is flattened to approximate the Earth's oblate shape), the offset between the two ellipsoids needs to be specified along X, Y, and Z axes. The window shown below (Figure 2.19.3) illustrates ellipsoidal and offset parameters for several horizontal datums, all expressed in relation to WGS 84.

Figure 2.19.3 Datum list window in the Waypoint+ software utility (Hildebrand 1997). NAD 27, NAD 83, and WGS 84 are highlighted. The ellipsoid associated with each datum is named, and its size and shape specified (Delta A and Delta (1/f)x10e4), along with three offset parameters, in meters, relative to WGS 84 (Delta x, Delta y, Delta z).

Credit: Waypoint+ software utility

For larger study areas, more accurate results may be obtained using a seven-parameter transformation that accounts for rotation as well as scaling and offset.

Finally, surface-fitting transformations like the NADCON grid interpolation described above yield the best results over the largest areas.

For routine mapping applications covering relatively small geographic areas (i.e., larger than 1:25,000), the plane transformations described earlier may yield adequate results when datum specifications are unknown and when a sufficient number of appropriately distributed control points can be identified.

19. Map Projections

Latitude and longitude coordinates specify positions in a more-or-less spherical grid called the graticule. Plane coordinates like the eastings and northings in the Universal Transverse Mercator (UTM) and State Plane Coordinates (SPC) systems denote positions in flattened grids. This is why georeferenced plane coordinates are referred to as projected, and geographic coordinates are called unprojected. The mathematical equations used to transform latitude and longitude coordinates to plane coordinates are called map projections. Inverse projection formulae transform plane coordinates to geographic. The simplest kind of projection, illustrated in Figure 2.20.1, below, transforms the graticule into a rectangular grid in which all grid lines are straight, intersect at right angles, and are equally spaced. More complex projections yield grids in which the lengths, shapes, and spacing of the grid lines vary.

Graticule on a sphere (left) with a projected flatgraticule (right)

Figure 2.20.1 Map projections are mathematical transformations between geographic coordinates and plane coordinates.

If you are a GIS practitioner, you have probably faced the need to superimpose unprojected latitude and longitude data onto projected data, and vice versa. For instance, you might have needed to merge geographic coordinates measured with a GPS receiver with digital data published by the USGS that are encoded as UTM coordinates. Modern GIS software provides sophisticated tools for projecting and unprojecting data. To use such tools most effectively, you need to understand the projection characteristics of the data sets you intend to merge. We'll examine map projections in some detail elsewhere in this chapter. Here, let's simply review the characteristics that are included in the "Spatial Reference Information" section of the metadata documents that (ideally!) accompany the data sets you might wish to incorporate in your GIS. These include:

Projection Name Most common in the GIS realm is the Transverse Mercator, which serves as the basis of the global UTM plane coordinate system, the U.K. and proposed U.S. National Grids, and many zones in the U.S. State Plane Coordinate system (SPC). Other SPC zones are based upon the Lambert Conic Conformal projection, which like many projections is named for its inventor as well as its projection category (conic) and the geometric properties it preserves (conformal). Much map data, particularly in the form of printed paper maps, are based upon "legacy" projections (like the Polyconic in the U.S.) that are no longer widely used. A much greater variety of projection types tend to be used in small scale thematic mapping than in large scale reference mapping.
Central Meridian Although no land masses are shown, let's assume that the graticule and projected grid shown above are centered on the intersection of the equator (0 latitude) and prime meridian (0° longitude). Most map projection formulae include a parameter that allows you to center the projected map upon any longitude.
Latitude of Projection Origin Under certain conditions, most map projection formulae allows you to specify different aspects of the grid. Instead of the equatorial aspect illustrated above, you might specify a polar aspect or oblique aspect by varying the latitude of projection origin such that one of the poles, or any latitude between the pole and the equator, is centered in the projected map. As you might imagine, the appearance of the grid changes a lot when viewed at different aspects.
Scale Factor at Central Meridian This is the ratio of map scale along the central meridian and the scale at a standard meridian, where scale distortion is zero. The scale factor at the central meridian is .9996 in each of the 60 UTM coordinate system zones since each contains two standard lines 180 kilometers west and east of the central meridian. Scale distortion increases with distance from standard lines in all projected coordinate systems.
Standard Lines Some projections, including the Lambert Conic Conformal, include parameters by which you can specify one or two standard lines along which there is no scale distortion caused by the act of transforming the spherical grid into a flat grid. By the same reasoning that two standard lines are placed in each UTM zone to minimize distortion throughout the zone to a maximum of one part in 1000, two standard parallels are placed in each SPC zone that is based on a Lambert projection such that scale distortion is no worse than one part in 10,000 anywhere in the zone.

20. UTM Coordinate System

Shown below in Figure 2.21.1 is the southwest corner of a 1:24,000-scale topographic map published by the United States Geological Survey (USGS). Note that the geographic coordinates (40 45' N latitude, 77° 52' 30" W longitude) of the corner are specified. Also shown, however, are ticks and labels representing two plane coordinate systems, the Universal Transverse Mercator (UTM) system and the State Plane Coordinates (SPC) system. The tick labeled "4515" represents a UTM grid line (called a "northing") that runs parallel to and 4,515,000 meters north of, the equator. Ticks labeled "258" and "259" represent grid lines that run perpendicular to the equator and 258,000 meters and 259,000 meters east, respectively, of the origin of the UTM Zone 18 North grid. Unlike longitude lines, UTM "eastings" are straight and do not converge upon the Earth's poles. All of this begs the question, Why are multiple coordinate system grids shown on the map? Why aren't geographic coordinates sufficient?

Southwest corner of a USGS topographic map of Pine GroveMills from 1962

Figure 2.21.1 Southwest corner of a USGS topographic map showing grid ticks and labels for three different coordinate systems, including the UTM coordinate system.

Credit: USGS. "State College quadrangle, Pennsylvania"

You can think of a plane coordinate system as the juxtaposition of two measurement scales. In other words, if you were to place two rulers at right angles, such that the "0" marks of the rulers aligned, you'd define a plane coordinate system. The rulers are called "axes." The absolute location of any point in the space in the plane coordinate system is defined in terms of distance measurements along the x (east-west) and y (north-south) axes. A position defined by the coordinates (1,1) is located one unit to the right, and one unit up from the origin (0,0). The UTM grid is a widely-used type of geospatial plane coordinate system in which positions are specified as eastings (distances, in meters, east of an origin) and northings (distances north of the origin).

By contrast, the geographic coordinate system grid of latitudes and longitudes consists of two curved measurement scales to fit the nearly-spherical shape of the Earth. As you know, geographic coordinates are specified in degrees, minutes, and seconds of arc. Curved grids are inconvenient to use for plotting positions on flat maps. Furthermore, calculating distances, directions and areas with spherical coordinates are cumbersome in comparison with plane coordinates. For these reasons, cartographers and military officials in Europe and the U.S. developed the UTM coordinate system. UTM grids are now standard not only on printed topographic maps but also for the geospatial referencing of the digital data that comprise the emerging U.S. "National Map."

In this section of Chapter 2, you will learn to:

describe the characteristics of the UTM coordinate system, including its basis in the Transverse Mercator map projection; and
plot UTM coordinates on a map.

21. The UTM Grid and Transverse Mercator Projection

Figure 2.22.1 A Mercator projection of the world, showing the 60 UTM coordinate system zones, each divided into north and south halves at the equator. Also shown are two polar coordinate systems used to specify positions beyond the northern and southern limits of the UTM system.

The Universal Transverse Mercator system is not really universal, but it does cover nearly the entire Earth surface. Only polar areas--latitudes higher than 84° North and 80° South--are excluded. (Polar coordinate systems are used to specify positions beyond these latitudes.) The UTM system divides the remainder of the Earth's surface into 60 zones, each spanning 6° of longitude. These are numbered west to east from 1 to 60, starting at 180° West longitude (roughly coincident with the International Date Line).

The illustration above (Figure 2.22.1) depicts UTM zones as if they were uniformly "wide" from the Equator to their northern and southern limits. In fact, since meridians converge toward the poles on the globe, every UTM zone tapers from 666,000 meters in "width" at the Equator (where 1° of longitude is about 111 kilometers in length) to only about 70,000 meters at 84° North and about 116,000 meters at 80° South.

"Transverse Mercator" refers to the manner in which geographic coordinates are transformed into plane coordinates. Such transformations are called map projections. The illustration below (Figure 2.22.2) shows the 60 UTM zones as they appear when projected using a Transverse Mercator map projection formula that is optimized for the UTM zone highlighted in yellow, Zone 30, which spans 6° West to 0° East longitude (the prime meridian).

As you can imagine, you can't flatten a globe without breaking or tearing it somehow. Similarly, the act of mathematically transforming geographic coordinates to plane coordinates necessarily displaces most (but not all) of the transformed coordinates to some extent. Because of this, map scale varies within projected (plane) UTM coordinate system grids.

The distortion ellipses plotted in red help us visualize the pattern of scale distortion associated with a particular projection. Had no distortion occurred in the process of projecting the map shown in Figure 2.22.2, below, all of the ellipses would be the same size, and circular in shape. As you can see, the ellipses centered within the highlighted UTM zone are all the same size and shape. Away from the highlighted zone, the ellipses steadily increase in size, although their shapes remain uniformly circular. This pattern indicates that scale distortion is minimal within Zone 30, and that map scale increases away from that zone. Furthermore, the ellipses reveal that the character of distortion associated with this projection is that shapes of features as they appear on a globe are preserved while their relative sizes are distorted. Map projections that preserve shape by sacrificing the fidelity of sizes are called conformal projections. The plane coordinate systems used most widely in the U.S., UTM and SPC (the State Plane Coordinates system) are both based upon conformal projections.

Figure 2.22.2 The result of a Transverse Mercator projection of the world centered on UTM Zone 30. Red circles reveal the scale distortion introduced during the transformation from geographic to projected plane coordinates. On the globe, all the circles would be the same size.

The Transverse Mercator projection illustrated above (Figure 2.22.2) minimizes distortion within UTM zone 30. Fifty-nine variations on this projection are used to minimize distortion in the other 59 UTM zones. In every case, distortion is no greater than 1 part in 1,000. This means that a 1,000 meter distance measured anywhere within a UTM zone will be no worse than + or - 1 meter off.

The animation linked to the illustration in Figure 2.22.3, below, shows a series of 60 Transverse Mercator projections that form the 60 zones of the UTM system. Each zone is based upon a unique Transverse Mercator map projection that minimizes distortion within that zone. Zones are numbered 1 to 60 eastward from the international date line. The animation begins with Zone 1.

Figure 2.22.3 One frame of an animation showing a sequence of the 60 Transverse Mercator projections used as the basis of the UTM coordinate system. Highlighted in red is UTM Zone 01, which spans 180° W to 174° W. A unique projection is used for every UTM zone, so that deformation within each zone is minimized.

Try This!

Click the graphic above in Figure 2.22.3 to download and view the animation file (utm.mp4) in a new tab.

Map projections are mathematical formulae used to transform geographic coordinates into plane coordinates. (Inverse projection formulae transform plane coordinates back into latitudes and longitudes.) "Transverse Mercator" is one of a hypothetically infinite number of such projection formulae. A visual analog to the Transverse Mercator projection appears below in Figure 2.22.4. Conceptually, the Transverse Mercator projection transfers positions on the globe to corresponding positions on a cylindrical surface, which is subsequently cut from end to end and flattened. In the illustration, the cylinder is tangent to the globe along one line, called the standard line. As shown in the little world map beside the globe and cylinder, scale distortion is minimal along the standard line and increases with distance from it. The animation linked above (Figure 2.22.3) was produced by rotating the cylinder 59 times at an increment of 6°.

Figure 2.22.4 The map above represents a Transverse Mercator projection of the world with a standard meridian at 0° longitude. (Note that because of the very small size of the map, the graticule is shown at 30° resolution.) The globe wrapped in a cylinder is a conceptual model of how the Transverse Mercator projection formula transfers positions on the globe to positions on a plane (The cylinder can be flattened to a plane surface after it is unwrapped from the globe.) The thicker red line on the cylinder and the map is the standard line along which scale distortion is zero. As the distortion ellipses on the map indicate, distortion increases with distance from the standard line.

In the illustration above in Figure 2.22.4, there is one standard meridian. Some projection formulae, including the Transverse Mercator projection, allow two standard lines. Each of the 60 variations on the Transverse Mercator projection used as the foundations of the 60 UTM zones employ not one, but two, standard lines. These two standard lines are parallel to, and 180,000 meters east and west of, each central meridian. This scheme ensures that the maximum error associated with the projection due to scale distortion will be 1 part in 1,000 (at the outer edge of the zone at the equator). The error due to scale distortion at the central meridian is 1 part in 2,500. Distortion is zero, of course, along the standard lines.

So, what does the term "transverse" mean? This simply refers to the fact that the cylinder shown above in Figure 2.22.4 has been rotated 90° from the equatorial aspect of the standard Mercator projection, in which a single standard line coincides with 0° latitude.

Map showing UTM zones numbers 10 through 19

Figure 2.22.5 The ten UTM zones that span the conterminous U.S.

Credit: U.S. Geological Survey, 2004

One disadvantage of the UTM system is that multiple coordinate systems must be used to account for large entities. The lower 48 United States, for instance, spread across ten UTM zones. The fact that there are many narrow UTM zones can lead to confusion. For example, the city of Philadelphia, Pennsylvania is east of the city of Pittsburgh. If you compare the Eastings of centroids representing the two cities, however, Philadelphia's Easting (about 486,000 meters) is less than Pittsburgh's (about 586,000 meters). Why? Because although the cities are both located in the U.S. state of Pennsylvania, they are situated in two different UTM zones. As it happens, Philadelphia is closer to the origin of its Zone 18 than Pittsburgh is to the origin of its Zone 17. If you were to plot the points representing the two cities on a map, ignoring the fact that the two zones are two distinct coordinate systems, Philadelphia would appear to the west of Pittsburgh. Inexperienced GIS users make this mistake all the time. Fortunately, GIS software is getting sophisticated enough to recognize and merge different coordinate systems automatically.

22. UTM Zone Characteristics

The illustration in Figure 2.23.1, below, depicts the area covered by a single UTM coordinate system grid zone. Each UTM zone spans 6° of longitude, from 84° North and 80° South. Zones taper from 666,000 meters in "width" at the Equator (where 1° of longitude is about 111 kilometers in length) to only about 70,000 meters at 84° North and about 116,000 meters at 80° South. Polar areas are covered by polar coordinate systems. Each UTM zone is subdivided along the equator into two halves, north and south.

Figure 2.23.1 Extent of one UTM coordinate system grid zone. Note that although latitudes are used to specify the extent precisely in relation to the globe, they are geographic, not UTM, coordinates.

The illustration below in Figure 2.23.2 shows how UTM coordinate grids relate to the area of coverage illustrated above in Figure 2.23.1. The north and south halves are shown side by side for comparison. Each half is assigned its own origin. The north south zone origins are positioned to south and west of the zone. North zone origins are positioned on the Equator, 500,000 meters west of the central meridian. Origins are positioned so that every coordinate value within every zone is a positive number. This minimizes the chance of errors in distance and area calculations. By definition, both origins are located 500,000 meters west of the central meridian of the zone (in other words, the easting of the central meridian is always 500,000 meters E). These are considered "false" origins since they are located outside the zones to which they refer. UTM eastings range from 167,000 meters to 833,000 meters at the equator. These ranges narrow toward the poles. Northings range from 0 meters to nearly 9,400,000 in North zones and from just over 1,000,000 meters to 10,000,000 meters in South zones. Note that positions at latitudes higher than 84° North and 80° South are defined in Polar Stereographic coordinate systems that supplement the UTM system.

Figure 2.23.2 UTM coordinate system zone characteristics. Yellow represents areas in which UTM coordinates are valid for a given zone. Red lines parallel to the central meridian represent the two standard lines employed in each Transverse Mercator projection. Each square grid cell in the illustration spans 500,000 meters on each side.

See the Bibliography (last page of the chapter) for further readings about the UTM grid system.

23. National Grids

The Transverse Mercator projection provides a basis for existing and proposed national grid systems in the United Kingdom and the United States.

In the U.K., topographic maps published by the Ordnance Survey refer to a national grid of 100 km squares, each of which is identified by a two-letter code. Positions within each grid square are specified in terms of eastings and northings between 0 and 100,000 meters. The U.K. national grid is a plane coordinate system that is based upon a Transverse Mercator projection whose central meridian is 2 West longitude, with standard meridians 180 km west and east of the central meridian. The grid is typically related to the Airy 1830 ellipsoid, a relationship known as the National Grid (OSGB36®) datum. The corresponding UTM zones are 29 (central meridian 9° West) and 30 (central meridian 3° West). One of the advantages of the U.K. national grid over the global UTM coordinate system is that it eliminates the boundary between the two UTM zones.

A similar system has been proposed for the U.S. by the Federal Geographic Data Committee. The proposed "U.S. National Grid" is the same as the Military Grid Reference System (MGRS), a worldwide grid that is very similar to the UTM system. As Phil and Julianna Muehrcke (1998, p.p. 229-230) write in the 4th edition of Map Use, "the military [specifically, the U.S. Department of Defense] aimed to minimize confusion when using long numerical [UTM] coordinates" by specifying UTM zones and sub-zones with letters instead of numbers. Like the UTM system, the MGRS consists of 60 zones, each spanning 6° longitude. Each UTM zone is subdivided into 19 MGRS quadrangles of 8° latitude and one (quadrangle from 72° to 84° North) of 12° latitude. The letters C through X are used to designate the grid cell rows from south to north. I and O are omitted to avoid confusion with numbers. Wikipedia offers a good entry on the MGRS here.

Try This!

Fun Demo of U.K. National Grid

A kids-friendly information sheet about the U.K. National Grid is published by the U.K. Ordnance Survey. You can find it in the National Grid for Schools link on their website.

A less-kids-friendly video can be seen below:

The National Grid

Click Here for Transcript of The National Grid Video

If you have ever used an ordnance survey paper map and been given a grid reference and located it using the numbers along the edge of the map then you'll have used the National Grid if you're not familiar with reading grid references please look at the education section of our website however if you look at one of our paper maps you will notice that apart from the grid numbers there are also letter pairs around the edge these point to the wider scale of the national grid here we'll use the example of our head office in Southampton to illustrate this the National Grid starts out as a series of 25 500 kilometer by 500 kilometer squares each given a letter of the alphabet from A to Z missing out I each of these 500 kilometer squares is then subdivided into a series of 25 100 kilometer by 100 kilometre squares again each with a letter of the alphabet A to Z excluding I this then results in a series of 100 kilometer squares with references such as SX sy and s set in terms of products this is how one 225-thousand scale color raster tiles are referenced these 100 kilometre squares can then be divided again into 110 kilometers by 10 kilometre squares these are each given a number from 0 0 to 99 so using our office as an example it falls within su-31 this is our products such as 1 to 25000 scale color raster and ho aspect amount district tiles are referenced these 10 kilometer squares can also be merged into a sets of 4 to form 25 20 kilometer squares in our example our office falls in square su 20 this is how one 250-thousand scale color raster tiles are referenced these 10 kilomettre tiles can be divided into one of two ways either into five kilometres by five kilometres squares or into one kilometer by one kilometre squares with the five kilometre squares the ten kilometre square is split into four and is referenced as northwest northeast southeast or Southwest so for example with su-31 five kilometre square where our offices would be su-31 northeast this five kilometre grid is held products os Street View and one to ten thousand scale raster tiles are referenced alternatively the ten kilometer squares can be split into one kilometer squares so with our office the one kilometer grid reference would be su-37 one five this one kilometer grid is how OS master map imagery tiles are referenced to find out more please visit our website.

Credit: Ordnance Survey

24. State Plane Coordinate System

Shown below in Figure 2.25.1 is the southwest corner of a 1:24,000-scale topographic map published by the United States Geological Survey (USGS). Note that the geographic coordinates (40 45' N latitude, 77° 52' 30" W longitude) of the corner are specified. Also shown, however, are ticks and labels representing two plane coordinate systems, the Universal Transverse Mercator (UTM) system and the State Plane Coordinate (SPC) system. The tick labeled "1 970 000 FEET" represents a SPC grid line that runs perpendicular to the equator and 1,970,000 feet east of the origin of the Pennsylvania North zone. The origin lies far to the west of this map sheet. Other SPC grid lines, called "northings" (not shown in the illustration), run parallel to the equator and perpendicular to SPC eastings at increments of 10,000 feet. Unlike longitude lines, SPC eastings and northings are straight and do not converge upon the Earth's poles.

Southwest corner of a USGS topographic map of Pine Grove Mills

Figure 2.25.1 Southwest corner of a USGS topographic map showing grid ticks and labels for three different coordinate systems, including the SPC coordinate system.

Credit: USGS. "State College Quadrangle, Pennsylvania"

The SPC grid is a widely-used type of geospatial plane coordinate system in which positions are specified as eastings (distances east of an origin) and northings (distances north of an origin). You can tell that the SPC grid referred to in the map illustrated above is the older 1927 version of the SPC grid system because (a) eastings and northings are specified in feet and (b) grids are based upon the North American Datum of 1927 (NAD27). The 124 zones that make up the State Plane Coordinates system of 1983 are based upon NAD 83, and generally use the metric system to specify eastings and northings.

State Plane Coordinates are frequently used to georeference large scale (small area) surveying and mapping projects because plane coordinates are easier to use than latitudes and longitudes for calculating distances and areas. And because SPC zones extend over relatively smaller areas, less error accrues to positions, distances, and areas calculated with State Plane Coordinates than with UTM coordinates.

In this section you will learn to:

describe the characteristics of the SPC system, including map projection on which it is based; and
convert geographic coordinates to SPC coordinates.

25. The SPC Grid and Map Projections

Plane coordinate systems pretend the world is flat. Obviously, if you flatten the entire globe to a plane surface, the sizes and shapes of the land masses will be distorted, as will distances and directions between most points. If your area of interest is small enough, however, and if you flatten it cleverly, you can get away with a minimum of distortion. The basic design problem that confronted the geodesists who designed the State Plane Coordinate System, then, was to establish coordinate system zones that were small enough to minimize distortion to an acceptable level, but large enough to be useful.

The State Plane Coordinate System of 1983 (SPC) is made up of 124 zones that cover the 50 U.S. states. As shown below in Figure 2.26.1, some states are covered with a single zone while others are divided into multiple zones. Each zone is based upon a unique map projection that minimizes distortion in that zone to 1 part in 10,000 or better. In other words, a distance measurement of 10,000 meters will be at worst one meter off (not including instrument error, human error, etc.). The error rate varies across each zone, from zero along the projection's standard lines to the maximum at points farthest from the standard lines. Errors will accrue at a rate much lower than the maximum at most locations within a given SPC zone. SPC zones achieve better accuracy than UTM zones because they cover smaller areas, and so are less susceptible to projection-related distortion.

Map of U.S. States' Plane Coordinate system which has 124 zones; some states have more than others.

Figure 2.26.1 The U.S. State Plane Coordinate system of 1983 consists of 124 zones (Doyle 2004). Each zone is a distinct plane coordinate system. (Alaska and Hawaii not shown.)

Most SPC zones are based on either a Transverse Mercator or Lambert Conic Conformal map projection whose parameters (such as standard line(s) and central meridians) are optimized for each particular zone. "Tall" zones like those in New York state, Illinois, and Idaho are based upon unique Transverse Mercator projections that minimize distortion by running two standard lines north-south on either side of the central meridian of each zone. "Wide" zones like those in Pennsylvania, Kansas, and California are based on unique Lambert Conformal Conic projections that run two standard parallels west-east through each zone. (One of Alaska's zones is based upon an "oblique" variant of the Mercator projection. That means that instead of standard lines parallel to a central meridian, as in the transverse case, the Oblique Mercator runs two standard lines that are tilted so as to minimize distortion along the Alaskan panhandle.)

The two types of map projections share the property of conformality, which means that angles plotted in the coordinate system are equal to angles measured on the surface of the Earth. As you can imagine, conformality is a useful property for land surveyors, who make their livings measuring angles. (Surveyors measure distances too, but unfortunately there is no map projection that can preserve true distances everywhere within a plane coordinate system.) Let's consider these two types of map projections briefly.

Like most map projections, the Transverse Mercator projection is actually a mathematical transformation. The illustration below in Figure 2.26.2 may help you understand how the math works. Conceptually, the Transverse Mercator projection transfers positions on the globe to corresponding positions on a cylindrical surface, which is subsequently cut from end to end and flattened. In the illustration, the cylinder is tangent to (touches) the globe along one line, the standard line (specifically, the standard meridian). As shown in the little world map beside the globe and cylinder, scale distortion is minimal along the standard line and increases with distance from it.

The distortion ellipses plotted in red help us visualize the pattern of scale distortion associated with a generic Transverse Mercator projection. Had no distortion occurred in the process of projecting the map shown below, all of the ellipses would be the same size, and circular in shape. As you can see, the ellipses plotted along the central meridian are all the same size and circular shape. Away from the central meridian, the ellipses steadily increase in size, although their shapes remain uniformly circular. This pattern reflects the fact that scale distortion increases with distance from the standard line. Furthermore, the ellipses reveal that the character of distortion associated with this projection is that shapes of features as they appear on a globe are preserved while their relative sizes are distorted. By preserving true angles, conformal projections like the Mercator (including its transverse and oblique variants) also preserve shapes.

Conceptual model of a Transverse Mercator map projection with map explained below

Figure 2.26.2 Conceptual model of a Transverse Mercator map projection (left) and the resulting map (right). The thick red lines represent the line of tangency between the globe and the projection surface (the cylinder) and the corresponding standard meridian on the map. Red circles on the map reveal that distortion introduced as a result of the map projection increases with distance from the standard line. On the globe, all the circles would be the same size.

SPC zones that trend west to east (including Pennsylvania's) are based on unique Lambert Conformal Conic projections. Instead of the cylindrical projection surface used by projections like the Mercator, the Lambert Conformal Conic and map projections like it employ conical projection surfaces like the one shown below in Figure 2.26.3. Notice the two lines at which the globe and the cone intersect. Both of these are standard lines; specifically, standard parallels. The latitudes of the standard parallels selected for each SPC zones minimize scale distortion throughout that zone.

Conceptual model of a Lambert Conformal Conic map projection and the resulting map explained below

Figure 2.26.3 Conceptual model of a Lambert Conformal Conic map projection (left) and the resulting map (right). The two thick red lines marking the intersections of the globe and the projection surface (the cone) correspond with two standard parallels on the map. Red circles on the map confirm that map scale is equal along both standard parallels. Distortion increases with distance from the standard parallels everywhere else in the projected map and in the coordinate system on which it is based.

26. SPC Zone Characteristics

In consultation with various state agencies, the National Geodetic Survey (NGS) originally devised the State Plane Coordinate System in the 1930s with several design objectives in mind. Chief among these were:

plane coordinates for ease of use in calculations of distances and areas;
all positive values to minimize calculation errors; and
a maximum error rate of 1 part in 10,000.

Plane coordinates specify positions in flat grids. Map projections are needed to transform latitude and longitude coordinates to plane coordinates. The designers did two things to minimize the inevitable distortion associated with map projections. First, they divided each state into zones small enough to meet the 1 part in 10,000 error threshold. Second, they used slightly different map projection formulae for each zone. The curved, dashed red lines in the illustration below for Figure 2.27.1 represent the two standard parallels that pass through each zone. The latitudes of the standard lines are one of the parameters of the Lambert Conic Conformal projection that can be customized to minimize distortion within the zone.

Positions in any coordinate system are specified relative to an origin. SPC zone origins are defined so as to ensure that every easting and northing in every zone are positive numbers. As shown in the illustration below, SPC origins are positioned south of the counties included in each zone. The origins coincide with the central meridian of the map projection upon which each zone is based. The easting and northing values at the origins are not 0, 0. Instead, eastings are defined as positive values sufficiently large to ensure that every easting in the zone is also a positive number. The false origin of the Pennsylvania North zone, for instance, is defined as 600,000 meters East, 0 meters North. Origin eastings vary from zone to zone from 200,000 to 8,000,000 meters East.

Pennsylvania North Zone and Pennsylvania South Zone and the latitude line that divides them

Figure 2.27.1 Schematic view of two State Plane Coordinate System zones, showing the counties that make up each zone (in yellow), the origins of each zone, and the standard parallels of the map projections upon which the zones are based, along which scale distortion is zero.

The State Plane Coordinate System will be affected by NGS' National Spatial Reference System modernization that was planned for 2022. in the new system, each state will have several "layered" plane coordinate systems, including a statewide layer for ease of use in GIS analyses, and one or "default" layers made up of zones that minimize distortion for surveying and engineering applications. You can read up on SPCS 2022 at the National Geodetic Survey's web site.

27. Map Projections

Latitude and longitude coordinates specify point locations within a coordinate system grid that is fitted to sphere or ellipsoid that approximates the Earth's shape and size. To display extensive geographic areas on a page or computer screen, as well as to calculate distances, areas, and other quantities most efficiently, it is necessary to flatten the Earth.

Graticule on sphere showing the projected graticule when transformed

Figure 2.28.1 Map projections are mathematical equations that transform geographic coordinates (conventionally designated by the Greek symbols lambda for longitude and phi for latitude) into plane coordinates (x and y). If all the necessary parameters are known, inverse projection equations can be used to transform projected coordinates back into unprojected geographic coordinates.

Georeferenced plane coordinate systems like the Universal Transverse Mercator and State Plane Coordinates systems (examined elsewhere in this chapter) are created by first flattening the graticule, then superimposing a rectangular grid over the flattened graticule. The first step, transforming the geographic coordinate system grid from a more-or-less spherical shape to a flat surface, involves systems of equations called map projections.

Many different map projection methods exist. Although only a few are widely used in large-scale mapping, the projection parameters used vary greatly. Geographic information systems professionals are expected to be knowledgeable enough to select a map projection that is suitable for a particular mapping objective. Such professionals are expected to be able to recognize the type, amount, and distribution of geometric distortion associated with different map projections. Perhaps most important, they need to know about the parameters of map projections that must be matched in order to merge geographic data from different sources. The pages that follow introduce the key concepts. The topic is far too involved to master in one section of a single chapter, however. Indeed, Penn State offers an entire online course in Map Projections: Spatial Reference Systems in GIS (GEOG 861). If you are or plan to become, a GIS professional, you should own at least one good book on map projections. Several recommendations follow in the bibliography at the end of this chapter.

Students who successfully complete this section should be able to:

interpret distortion diagrams to identify geometric properties of the sphere that are preserved by a particular projection;
classify projected graticules by projection family.

28. Geometric Properties Preserved and Distorted

Many types of map projections have been devised to suit particular purposes. No projection allows us to flatten the globe without distorting it, however. Distortion ellipses help us to visualize what type of distortion a map projection has caused, how much distortion has occurred, and where it has occurred. The ellipses show how imaginary circles on the globe are deformed as a result of a particular projection. If no distortion had occurred in the process of projecting the map shown below in Figure 2.29.1, all of the ellipses would be the same size, and circular in shape.

When positions on the graticule are transformed to positions on a projected grid, four types of distortion can occur: distortion of sizes, angles, distances, and directions. Map projections that avoid one or more of these types of distortion are said to preserve certain properties of the globe.

Equivalence

World map showing ellipses that illustrate distortion pattern characteristic of an equal area projection. smooshed at poles

Figure 2.29.1 Equal-Area Distortion

So-called equal-area projections maintain correct proportions in the sizes of areas on the globe and corresponding areas on the projected grid (allowing for differences in scale, of course). Notice that the shapes of the ellipses in the Cylindrical Equal Area projection above (Figure 2.29.1) are distorted, but the areas each one occupies are equivalent. Equal-area projections are preferred for small-scale thematic mapping, especially when map viewers are expected to compare sizes of area features like countries and continents.

Conformality

World map showing ellipses that illustrate distortion pattern characteristic of a conformal projection World map showing ellipses that illustrate distortion pattern characteristic of a conformal projection larger at poles

Figure 2.29.2 Conformal Projection Distortion

The distortion ellipses plotted on the conformal projection shown above in Figure 2.29.2 vary substantially in size, but are all the same circular shape. The consistent shapes indicate that conformal projections (like this Mercator projection of the world) preserve the fidelity of angle measurements from the globe to the plane. In other words, an angle measured by a land surveyor anywhere on the Earth's surface can be plotted on at its corresponding location on a conformal projection without distortion. This useful property accounts for the fact that conformal projections are almost always used as the basis for large scale surveying and mapping. Among the most widely used conformal projections are the Transverse Mercator, Lambert Conformal Conic, and Polar Stereographic.

Conformality and equivalence are mutually exclusive properties. Whereas equal-area projections distort shapes while preserving fidelity of sizes, conformal projections distort sizes in the process of preserving shapes.

Equidistance

World map showing ellipses illustrating distortion pattern characteristic of equidistant projection. Ellipses near poles, small at equator

Figure 2.29.3 Equidistant projection distortion

Equidistant map projections allow distances to be measured accurately along straight lines radiating from one or two points only. Notice that ellipses plotted on the Cylindrical Equidistant (Plate Carrée) projection shown above (Figure 2.29.3) vary in both shape and size. The north-south axis of every ellipse is the same length, however. This shows that distances are true-to-scale along every meridian; in other words, the property of equidistance is preserved from the two poles. See chapters 11 and 12 of the online publication Matching the Map Projection to the Need to see how projections can be customized to facilitate distance measurements and to effectively depict ranges and rings of activity.

Azimuthality

World map showing ellipses that illustrate distortion pattern characteristic of an azimuthal projection

Figure 2.29.4 Azimuthal projection distortion

Azimuthal projections preserve directions (azimuths) from one or two points to all other points on the map. See how the ellipses plotted on the gnomonic projection, shown above in Figure 2.29.4, vary in size and shape, but are all oriented toward the center of the projection? In this example, that's the one point at which directions measured on the globe are not distorted on the projected graticule.

Compromise

Figure 2.29.5 Polyconic projection distortion

Some map projections preserve none of the properties described above, but instead seek a compromise that minimizes distortion of all kinds. The example shown above in Figure 2.29.5 is the Polyconic projection, which was used by the U.S. Geological Survey for many years as the basis of its topographic quadrangle map series until it was superceded by the conformal Transverse Mercator. Another example is the Robinson projection, which is often used for small-scale thematic maps of the world.

29. Classifying Projection Methods

The term "projection" implies that the ball-shaped net of parallels and meridians is transformed by casting its shadow upon some flat, or flattenable, surface. In fact, almost all map projection methods are mathematical equations. The analogy of an optical projection onto a flattenable surface is useful, however, as a means to classify the bewildering variety of projection equations devised over the past two thousand years or more.

A plane, cone, and cylinder are shown to which the graticule can be projected

Figure 2.30.1 Three types of "flattenable" surfaces to which the graticule can be projected: a plane, a cone, and a cylinder.

Imagine a model globe that is translucent, and contains a bright light bulb. Imagine the light literally casting shadows of the graticule, and of the shapes of the continents, onto another surface that touches the globe. As you might imagine, the appearance of the projected grid will change quite a lot depending on the type of surface it is projected onto, and how that surface is aligned with the globe. The three surfaces shown above in Figure 2.30.1--the disk-shaped plane, the cone, and the cylinder--represent categories that account for the majority of projection equations that are encoded in GIS software. All three are shown in their normal aspects. The plane often is centered upon a pole. The cone is typically aligned with the globe such that its line of contact (tangency) coincides with a parallel in the mid-latitudes. And the cylinder is frequently positioned tangent to the equator (unless it is rotated 90°, as it is in the Transverse Mercator projection). The following illustrations in Figure 2.30.2 show some of the projected graticules produced by projection equations in each category.

Cylindric, Conic, Pseudocylindric, and Planar map projections are shown

Figure 2.30.2 Four categories of map projections

Cylindric projection equations yield projected graticules with straight meridians and parallels that intersect at right angles. The example shown above at top left in Figure 2.30.2 is a Cylindrical Equidistant (also called Plate Carrée or geographic) in its normal equatorial aspect.

Pseudocylindric projections are variants on cylindrics in which meridians are curved. The result of a Sinusoidal projection is shown above at top right of Figure 2.30.2.

Conic projections yield straight meridians that converge toward a single point at the poles, parallels that form concentric arcs. The example shown above, at bottom left in Figure 2.30.2, is the result of an Albers Conic Equal Area, which is frequently used for thematic mapping of mid latitude regions.

Planar projections also yield meridians that are straight and convergent, but parallels form concentric circles rather than arcs. Planar projections are also called azimuthal because every planar projection preserves the property of azimuthality. The projected graticule shown above at bottom right of Figure 2.30.2 is the result of an Azimuthal Equidistant projection in its normal polar aspect.

Appearances can be deceiving. It's important to remember that the look of a projected graticule depends on several projection parameters, including latitude of projection origin, central meridian, standard line(s), and others. Customized map projections may look entirely different from the archetypes described above.

Try This!

The Interactive Album of Map Projections 2.0 is an application developed by the Penn State Online Geospatial Education Programs and is an update of an earlier site that was inspired by the USGS Professional Paper 1453, An Album of Map Projections, by John P. Snyder and Philip M. Voxland.

Flex Projector is a free, open source software program developed in Java that supports many more projections and variable parameters than the Interactive Album. Bernhard Jenny of the Institute of Cartography at ETH Zurich created the program with assistance from Tom Patterson of the US National Park Service. You can download Flex Projector from flexprojector.com

Those who wish to explore map projections in greater depth than is possible in this text might wish to visit an informative page published by the International Institute for Geo-Information Science and Earth Observation (Netherlands), which is known by the legacy acronym ITC.

30. Summary

In this chapter, we've explored several connotations of the term scale. Scale is synonymous with scope when it is used to describe the extent of a phenomenon. In this sense, "large scale" means "large area." Specialists in geographic information often use the term differently, however. Map scale refers to the relative sizes of features on a map and of corresponding objects on the ground. In this context, "large scale" implies "small area." Large scale also implies greater detail and greater accuracy, an important point to keep in mind when using maps as sources for GIS databases. Map scale is defined mathematically as the proportion of map distance to ground distance. I hope you are now prepared to use scale equations to calculate map scale.

Scale can also be thought of as a reference system for measurement. Locations on the globe are specified with reference to the geographic coordinate system of latitudes and longitudes. Plane coordinates are often preferred over geographic coordinates because they ease calculations of distance, area, and other quantities. Georeferenced plane coordinate systems like UTM and SPC are established by first flattening the graticule, then superimposing a plane coordinate grid. The mathematical equations used to transform geographic coordinates into plane coordinates are called map projections. Both plane and geographic coordinate system grids are related to approximations of the Earth's size and shape called ellipsoids. Relations between grids and ellipsoids are called horizontal datums.

Horizontal datum is an elusive concept for many GIS practitioners. It is relatively easy to visualize a horizontal datum in the context of unprojected geographic coordinates. Simply drape the latitude and longitude grid over an ellipsoid and there's your horizontal datum. It is harder to think about datum in the context of a projected coordinate grid like UTM and SPC, however. Think of it this way: First drape the latitude and longitude grid on an ellipsoid. Then project that grid to a 2-D plane surface. Then, superimpose a rectangular grid of eastings and northings over the projection, using control points to georegister the grids. There you have it--a projected coordinate grid based upon a horizontal datum.

Numerous coordinate systems, datums, and map projections are in use around the world. Because we often need to combine georeferenced data from various sources, GIS professionals need to be able to georegister two or more data sets that are based upon different coordinate systems, datums, and/or projections. Transformations, including coordinate transformations, datum transformations, and map projections, are the mathematical procedures used to bring diverse data into alignment. Characteristics of the coordinate systems, datums, and projections considered in this text are outlined in the following tables.

Coordinate systems referenced in this text

(many other national and local systems are in use)

Coordinate Systems Referenced
Coordinate System	Units	Extent	Projection Basis
Geographic	Angles (expressed as degrees, minutes, seconds or decimal degrees).	Global	None
UTM	Distances (meters)	Near-global (8430' N, 80° 30' S)	Unique Transverse Mercator projection for each of 60 zones
State Plane Coordinates	Distances (meters in SPCS 83, feet in SPCS 27)	U.S.	Unique Transverse Mercator or Lambert Conformal Conic projection for each of 123 zones (plus Oblique Mercator for Alaska panhandle)

Datums referenced in this text

(many other national and local systems are in use)

Datums referenced
Datum	Horizontal or vertical	Optimized for	Reference surface
NAD 27	Horizontal	North America	Clarke 1866 ellipsoid
NAD 83	Horizontal	North America	GRS 80 ellipsoid
WGS 84	Horizontal	World	WGS 84 ellipsoid
NAVD 88	Vertical	North America	Sea level measured at coastal tidal stations

Map projections referenced in this text

(many other national and local systems are in use)

Map projections referenced
Projection name	Properties preserved	Class	Distortion
Mercator	Conformal	Cylindrical	Area distortion increases with distance from standard parallel (typically equator).
Transverse Mercator	Conformal	Cylindrical	Area distortion increases with distance from standard meridian.
Lambert Conformal Conic	Conformal	Conic	Area distortion increases with distance from one or two standard parallels.
Plate Carrée (sometimes called "Geographic" projection)	Equidistant	Cylindrical	Area and shape distortion increases with distance from standard parallel (typically equator).
Albers Equal-Area Conic	Equivalent	Conic	Shape distortion increases with distance from one or two standard parallels.

Compiled from Snyder, 1997

31. Bibliography

3-D Software (2005). Map projections pages. Retrieved January 8, 2005, from http://www.3dsoftware.com/Cartography/

American Congress on Surveying and Mapping (n. d.). The North American Datum of 1983. A collection of papers describing the planning and implementation of the readjustment of the North American horizontal network. Monograph No. 2.

Burkard, R. K. et al. (1959-2002). Geodesy for the layman. Retrieved October 29, 2003, from the National Imagery and Mapping Agency website http://www.ngs.noaa.gov/PUBS_LIB/Geodesy4Layman/toc.htm

Chem-Nuclear Systems, Inc. (1993). Site screening interim report: Stage two -- regional disqualification. Harrisburg PA.

Chrisman, N. (2002). Exploring geographic information systems (2nd ed.). New York: John Wiley & Sons.

Clarke, K. (1995). Analytical and computer cartography (2nd ed.). Upper Saddle River, NJ: Prentice Hall.

Dana, P. H. (1998). Coordinate systems overview. The Geographer's Craft Project. Retrieved July 30, 2020, from https://foote.geography.uconn.edu/gcraft/notes/coordsys/coordsys_f.html

Dana, P. H. (1999). Geodetic datums overview. The Geographer's Craft Project. Retrieved July 30, 2020, from https://foote.geography.uconn.edu/gcraft/notes/datum/datum_f.html

Dewhurst, W. T. (1990). NADCON: The application of minimum-curvature-derived surfaces in the transformation of positional data from the North American datum of 1927 to the North American datum of 1983. NOAA Technical Memorandum NOS NGS 50. Retrieved January 1, 2005, from http://www.ngs.noaa.gov/PUBS_LIB/NGS50.pdf

Doyle, D. (2004, February). NGS geodetic toolkit, Part 7: Computing state plane coordinates. Professional Surveyor Magazine, 24:, 34-36.

Dutch, S. (2003). The Universal Transverse Mercator System. Retrieved January 9, 2008, from http://www.uwgb.edu/DutchS/FieldMethods/UTMSystem.htm (offline)

Federal Geographic Data Committee. (December 2001). United States National Grid. Retrieved May 8, 2006, from http://www.fgdc.gov/standards/projects/FGDC-standards-projects/usng/fgdc_std_011_2001_usng.pdf

Hildebrand, B. (1997). Waypoint+. Retrieved January 1, 2005, from http://www.tapr.org/~kh2z/Waypoint/ (since retired).

Iliffe, J.C. (2000). Datums and map projections for remote sensing, GIS and surveying. Caithness, Scotland: Whittles Publishing. Distributed in U.S. by CRC Press.

Larrimore, C. (2002). NGS Geodetic Toolkit. Retrieved October 26, 2004, from https://www.ngs.noaa.gov/TOOLS/

Muehrcke, P. C. & Muehrcke, J. O. (1992). Map use (3rd ed.). Madison WI: JP Publications.

Muehrcke, P. C. & Muehrcke, J. O. (1998). Map use (4th ed.). Madison WI: JP Publications.

Mulcare, D. M. (2004). The National Geodetic Survey NADCON Tool. Professional Surveyor Magazine, February, pp. 28-33.

National Geodetic Survey. (n.d.). North American datum conversion utility. Retrieved April 2004, from http://www.ngs.noaa.gov/TOOLS/Nadcon/Nadcon.html (offline). Current version at https://geodesy.noaa.gov/TOOLS/Nadcon/Nadcon.shtml

National Geodetic Survey. (1997). Image generated from 15'x15' geoid undulations covering the planet Earth. Retrieved 1999, from http://www.ngs.noaa.gov/GEOID/geo-index.html (offline).

National Geodetic Survey. (2004). Coast and geodetic survey historical image collection. Retrieved June 25, 2004, from http://www.photolib.noaa.gov/cgs/index.html (offline). Current version at https://photolib.noaa.gov/Collections/Coast-Geodetic-Survey

National Geographic Society (1999). Round earth, flat maps. Retrieved April 18, 2006, from http://www.nationalgeographic.com/features/2000/exploration/projections/... (offline)

Ordnance Survey (2000). National GPS network information. 7: Transverse mercator map projections. Retrieved August 27, 2004, from http://www.gps.gov.uk/guide7.asp (offline)

Robinson, A. et al. (1995). Elements of cartography (5th ed.). New York: John Wiley & Sons.

Robinson, A. H. & Snyder, J. P. (1997). Matching the map projection to the need. Retrieved January 8, 2005, from the Cartography and Geographic Information Society and the Pennsylvania State University website: https://courseware.e-education.psu.edu/projection/

Slocum, T. A., McMaster, R. B., Kessler, F, C., & Howard, H. H. (2005). Thematic cartography and visualization (2nd ed.). Upper Saddle River, NJ: Prentice Hall.

Smith, J.R. (1988). Basic geodesy. Rancho Cordova CA: Landmark Enterprises.

Snyder, J. P. (1987). Map projections: A working manual (U.S. Geological Survey Professional Paper No. 1395). Washington DC: United States Government Printing Office.

Snyder, J. P. (1987). Map projections: A working manual. (USGS Professional Paper No. 1395). Washington DC: U.S. Geological Survey
(Electronic versions available at http://pubs.er.usgs.gov/djvu/PP/PP_1395.pdf)

Snyder, J. P. & Voxland P. M. (1989). An album of map projections (U.S. Geological Survey Professional Paper No. 1453). Washington DC: United States Government Printing Office.

Snyder, J. P. & Voxland, P. M. (1994). An album of map projections. (USGS Professional Paper No. 1453). Washington DC: U.S. Geological Survey. (ordering information published at http://erg.usgs.gov/isb/pubs/factsheets/fs08799.html)

Stem, J. E. (1990). State Plane Coordinate System of 1983 (NOAA Manual NOS NGS 5). Rockville, MD: National Geodetic Information Center.

The Large Scale Biosphere-Atmosphere Experiment in Amazonia (1999, July 1). Retrieved July 12, 1999, from http://daacl.ESD.ORNL.Gov/lba_cptec/ (since retired).

United States Geological Survey (2001). The universal transverse mercator grid. Fact sheet 077-01. Retrieved June 30, 2004, from http://mac.usgs.gov/mac/isb/pubs/factsheets/fs07701.html (offline).

United States Geological Survey (2003). National mapping program standards. Retrieved October 29, 2005, from http://rockyweb.cr.usgs.gov/nmpstds/nmas647.html (since retired).

USGS. "State College Quadrangle" [map]. 7.5 minute series. Washington, D.C.: USGS, 1962.

Van Sickle, J. (2017). Basic GIS coordinates, 3rd Ed. Boca Raton FL: CRC Press.

Wikipedia. The free encyclopedia. (2006). World geodetic system. Retrieved May 8, 2006, from http://en.wikipedia.org/wiki/WGS84

Wolf, P. R. & Brinker, R. C. (1994) Elementary Surveying (9th ed.). New York NY: HarperCollins.

Chapter 3: Census Data and Thematic Maps

1. Overview

In Chapter 2, we compared the characteristics of geographic and plane coordinate systems that are used to measure and specify positions on the Earth's surface. Coordinate systems, remember, are formed by juxtaposing two or more spatial measurement scales. I mentioned, but did not explain, that attribute data also are specified with reference to measurement scales. In this chapter, we'll take a closer look at how attributes are measured and represented.

Maps are both the raw material and the product of GIS. All maps, but especially so-called reference maps made to support a variety of uses, can be defined as sets of symbols that represent the locations and attributes of entities measured at certain times. Many maps, however, are subsets of available geographic data that have been selected and organized in response to a particular question. Maps created specifically to highlight the distribution of a particular phenomenon or theme are called thematic maps. Thematic maps are among the most common forms of geographic information produced by GIS.

A flat sheet of paper is an imperfect, but useful, analog for geographic space. Notwithstanding the intricacies of map projections, it is a fairly straightforward matter to plot points that stand for locations on the globe. Representing the attributes of locations on maps is sometimes not so straightforward, however. Abstract graphic symbols must be devised that depict, with minimal ambiguity, the quantities and qualities that give locations their meaning. Over the past 100 years or so, cartographers have adopted and tested conventions concerning symbol color, size, and shape for thematic maps. The effective use of graphic symbols is an important component in the transformation of geographic data into useful information.

US map showing percent population change by county from 1990 - 2000; most areas had high increases.

Figure 3.1.1 Population change in the United States, by county, from 1990 to 2000.

Credit: 1990 & 2000 decennial censuses.

Consider the map above (Figure 3.1.1), which shows how the distribution of U.S. population changed, by county, from 1990 to 2000. To gain a sense of how effective this thematic map is in transforming data into information, we need only to compare it to a list of population change rates for the more than 3,000 counties of the U.S. The thematic map reveals spatial patterns that the data themselves conceal.

This chapter explores the characteristics of attribute data used for thematic mapping, especially attribute data produced by U.S. Census Bureau. It also considers how the characteristics of attribute data influence choices about how to present the data on thematic maps.

Objectives

Students who successfully complete Chapter 3 should be able to:

use metadata and the World Wide Web to assess the content and availability of attribute data produced by the U.S. Census Bureau;
discriminate between different levels of measurement of attribute data;
explain the differences between counts, rates, and densities, and identify the types of map symbols that are most appropriate for representing each; and
use quantile and equal interval classification schemes to divide census attribute data into categories suitable for choroplethic mapping.

"Try This!" Activities

Take a minute to complete any of the Try This activities that you encounter throughout the chapter. These are fun, thought provoking exercises to help you better understand the ideas presented in the chapter.

2. Census Attribute Data

A thematic map is a graphic display that shows the geographic distribution of a particular attribute, or relationships among a few selected attributes. Some of the richest sources of attribute data are national censuses. In the United States, a periodic count of the entire population is required by the U.S. Constitution. Article 1, Section 2, ratified in 1787, states that Representatives and direct taxes shall be apportioned among the several states which may be included within this union, according to their respective numbers ... The actual Enumeration shall be made [every] ten years, in such manner as [the Congress] shall by law direct." The U.S. Census Bureau is the government agency charged with carrying out the decennial census.

The first section of the Declaration of Independence

Figure 3.3.1 A portion of the Constitution of the United States of America.

The results of the U.S. decennial census determine states' portions of the 435 total seats in the U.S. House of Representatives. The map below shows states that lost and gained seats as a result of the reapportionment that followed the 2000 census. Congressional voting district boundaries must be redrawn within the states that gained and lost seats, a process called redistricting. Constitutional rules and legal precedents require that voting districts contain equal populations (within about 1 percent). In addition, districts must be drawn so as to provide equal opportunities for representation of racial and ethnic groups that have been discriminated against in the past.

gain in border states, loss around great lakes, no change in midwest

Figure 3.3.2 Reapportionment of the U.S. House of Representatives as a result of the 2000 census.

Besides reapportionment and redistricting, U.S. census counts also affect the flow of billions of dollars of federal expenditures, including contracts and federal aid, to states and municipalities. In 1995, for example, some $70 billion of Medicaid funds were distributed according to a formula that compared state and national per capita income. $18 billion worth of highway planning and construction funds were allotted to states according to their shares of urban and rural population. And $6 billion of Aid to Families with Dependent Children was distributed to help children of poor families do better in school. The two thematic maps below (Figure 3.3.3) illustrate the strong relationship between population counts and the distribution of federal tax dollars.

US map showing population and federal expenditures by state. More people = more expenditure

Figure 3.3.3 Population and federal expenditures, by state, 1995.

Credit: Cartography by Thad Lenker. Data from U.S. Census Bureau, Federal Expenditures by State, Federal Expenditures by State

The Census Bureau's mandate is to provide the population data needed to support governmental operations including reapportionment, redistricting, and allocation of federal expenditures. Its mission, to be "the preeminent collector and provider of timely, relevant, and quality data about the people and economy of the United States," is broader, however. To fulfill this mission, the Census Bureau needs to count more than just numbers of people, and it does.

Try This!

The Redistricting Game

3. Enumerations versus Samples

Sixteen U.S. Marshals and 650 assistants conducted the first U.S. census in 1791. They counted some 3.9 million individuals, although as then-Secretary of State, Thomas Jefferson, reported to President George Washington, the official number understated the actual population by at least 2.5 percent (Roberts, 1994). By 1960, when the U.S. population had reached 179 million, it was no longer practical to have a census taker visit every household. The Census Bureau then began to distribute questionnaires by mail. Of the 116 million households to which questionnaires were sent in 2000, 72 percent responded by mail. A mostly-temporary staff of over 800,000 was needed to visit the remaining households, and to produce the final count of 281,421,906. Using statistically reliable estimates produced from exhaustive follow-up surveys, the Bureau's permanent staff determined that the final count was accurate to within 1.6 percent of the actual number (although the count was less accurate for young and minority residences than it was for older and white residents). It was the largest and most accurate census to that time. (Interestingly, Congress insists that the original enumeration or "head count" be used as the official population count, even though the estimate calculated from samples by Census Bureau statisticians is demonstrably more accurate.)

The mail-in response rate for the 2010 census was also 72 percent. As with most of the 20th century censuses the official 2010 census count, by state, had to be delivered to the Office of the President by December 31 of the census year. Then within one week of the opening of the next session of the Congress, the President reported to the House of Representatives the apportionment population counts and the number of Representatives to which each state was entitled.

In 1791, census takers asked relatively few questions. They wanted to know the numbers of free persons, slaves, and free males over age 16, as well as the sex and race of each individual. (You can view photos of historical census questionnaires here) As the U.S. population has grown, and as its economy and government have expanded, the amount and variety of data collected has expanded accordingly. In the 2000 census, all 116 million U.S. households were asked six population questions (names, telephone numbers, sex, age and date of birth, Hispanic origin, and race), and one housing question (whether the residence is owned or rented). In addition, a statistical sample of one in six households received a "long form" that asked 46 more questions, including detailed housing characteristics, expenses, citizenship, military service, health problems, employment status, place of work, commuting, and income. From the sampled data, the Census Bureau produced estimated data on all these variables for the entire population.

In the parlance of the Census Bureau, data associated with questions asked of all households are called 100% data and data estimated from samples are called sample data. Both types of data are available aggregated by various enumeration areas, including census block, block group, tract, place, county, and state (see the illustration below). Through 2000, the Census Bureau distributes the 100% data in a package called the "Summary File 1" (SF1) and the sample data as "Summary File 3" (SF3). In 2005, the Bureau launched a new project called American Community Survey that surveys a representative sample of households on an ongoing basis. Every month, one household out of every 480 in each county or equivalent area receives a survey similar to the old "long form." Annual or semi-annual estimates produced from American Community Survey samples replaced the SF3 data product in 2010.

To protect respondents' confidentiality, as well as to make the data most useful to legislators, the Census Bureau aggregates the data it collects from household surveys to several different types of geographic areas. SF1 data, for instance, are reported at the block or tract level. There were about 8.5 million census blocks in 2000. By definition, census blocks are bounded on all sides by streets, streams, or political boundaries. Census tracts are larger areas that have between 2,500 and 8,000 residents. When first delineated, tracts were relatively homogeneous with respect to population characteristics, economic status, and living conditions. A typical census tract consists of about five or six sub-areas called block groups. As the name implies, block groups are composed of several census blocks. American Community Survey estimates, like the SF3 data that preceded them, are reported at the block group level or higher.

Diagram of relationships among the various census geographies, see text description in link below

Click here for a text description

Nation
- Zip Codes
- Zip Code Tabulation Areas
- Urban Areas
- Metropolitan Areas
- American Indian, Alaska Native & Native Hawaiian Areas
Regions
Divisions
States
- School Districts
- Congressional Districts
- Economic Places
- Oregon Urban Growth Areas
- State Legislative Districts
- Alaska Native Regional Corporations
- Places
Counties
- Voting Districts
- Traffic Analysis Zones
- County Subdivisions
  - Subbarrios
Census Tracts
Block Groups
Blocks
- Zip Codes
- Zip Code Tabulation Areas
- Urban Areas
- Metropolitan Areas
- American Indian, Alaska Native & Native Hawaiian Areas
- School Districts
- Congressional Districts
- Economic Places
- Oregon Urban Growth Areas
- State Legislative Districts
- Alaska Native Regional Corporations
- Places
- Voting Districts
- Traffic Analysis Zones
- County Subdivisions
  - Subbarrios

Credit: U.S. Census Bureau. Click here for an explanation of the diagram.

4. American Community Survey

Beginning in 2010, the American Community Survey (ACS) replaced the "long form" that was used to collect sample data in past decennial censuses. Instead of sampling one in six households every ten years (about 18 million households in 2000), the ACS samples 2-3 million households every year. The goal of the ACS is to enable Census Bureau statisticians to produce more timely estimates of the demographic, economic, social, housing, and financial characteristics of the U.S. population. You can view a sample ACS questionnaire by entering the keywords "American Community Survey questionnaire" into your favorite Internet search engine.

Try This!

Acquiring and Understanding American Community Survey (ACS) Data

The purpose of this practice activity is to guide your exploration of ACS data and methodology. In the end, you should be able to identify the types of geographical areas for which ACS data are available; to explain why 1-year and 3-year estimates are available for some areas and not for others; and to describe how the statistical reliability of ACS estimates vary among 1-year, 3-year, and 5-year estimates.

Return to the U.S. Census Bureau site.
Click the Surveys/Programs tab and follow the link to American Community Survey (ACS). This takes you to the MAIN American Community Survey page.
Begin by clicking the Guidance for Data Users link and looking through the information available there.
Note the link to Handbooks for Data Users.
Under the More Guidance for Data Users Topics heading, pay particular attention to the When to use… section with its descriptions of the various estimates (1-, 3- and 5-year), and to the section on Comparing ACS Data to other census data. If you are so inclined, there is also a link to a listing of Training Presentations under this same heading. (You might benefit from Understanding Multiyear Estimates... offering.)
Next, hover your mouse cursor over the Data link located in the navigation list on the left side of the ACS page, and note what entries are there:
You can download ACS data to make maps and analyses using your own GIS or statistical software. Find download links and pertinent information in the sections titled Data via FTP and Summary File Data.
There is also a section pertaining to Public Use Microdata Sample (PUMS). PUMS data are edited, however, to protect the confidentiality of individuals and households.

In the remaining steps, you will make a map or two to reinforce the geographies covered by the American Community Survey. You will map data from your home (or adopted) state.
You first need to go to the MAIN American FactFinder site, then follow the Advanced Search / SHOW ME ALL link, click the Topics search box, then expand the Program list and choose American Community Survey. Close the Select Topics overlay window.)
Click the Geographies search options box (on the left) to reveal the Select Geographies overlay window.
Under Select, a geographic type, click County - 050.
Next, from the Select a state list, choose your state.
Then, from the Select one or more geographic areas... list, choose All Counties within <your state>.
Then, click ADD TO YOUR SELECTIONS. This will add the All Counties… entry to the Your Selections list.
Close the Select Geographies overlay window.
In the Search Results window, note that there are many datasets that have 1-, 3- and 5-year estimates entries.
Decide upon a 1-Year dataset to look at and check the box for it.
Then click View.
On the new Table Viewer page that you land on, be sure that the Create a Map choice is blue – not grayed out. (If it is grayed out, click the BACK TO ADVANCED SEARCH button and make sure only one dataset box is checked, or make a different choice, then click View again.)
Click on Create a Map. The data values in the table will turn blue, and you will be prompted to “Click on a data value in the table to map.” Clicking a single data value from any row will allow you to map the data in that row for all of the counties for which it is available. Click on a blue data value of your choice – remember which row you choose. Click on the SHOW MAP button in the small popup window that appears.
Are all of the counties in the state symbolized as having data? Why not?
Now, click the BACK TO ADVANCED SEARCH button. Un-check the box for the 1-year dataset, and check the box for the 5-year estimate of the same category. Proceed as above to map the data. After the map is refreshed, note how many counties now exhibit data.
Take a look at the 3-year estimates for the same dataset if you wish, though they may not be available for the more recent years.

5. International Data

The International Data Base is published on the web by the Census Bureau's International Programs Center. It combines demographic data compiled from censuses and surveys of some 227 countries and areas of the world, along with estimates produced by Census Bureau demographers. Data variables include population by age and sex; vital rates, infant mortality, and life tables; fertility and child survivorship; migration; marital status; family planning; ethnicity, religion, and language; literacy; and labor force, employment, and income. Census and survey data are available by country for selected years from 1950; projected data are available through 2050. The International Data Base allows you to download attribute data in formats appropriate for thematic mapping.

Try This!

Acquiring World Demographic Data via the World Wide Web

The purpose of this practice activity is to guide you through the process of finding and acquiring demographic data for the countries of the world from the U.S. Census Bureau data via the web. Your objective is to retrieve population change rates for a country of your choice over two or more years.

Return to the U.S. Census Bureau site.
Click the Topics tab, expand the Population list and click on International. That will take you to the International Programs page.
Click on the Data tab and then click on International Data Base (IDB).
Choose a data theme you are interested in from the Select Report pick list. The choices have to do with births and mortality, population change including such things as migration, population by age group, etc. (The Population Pyramid Graph choice gives you
a graph(s) rather than a data table.)

Data tables are available by Country or by Region.
From the Select one or more Countries or Areas pick list, you can specify that you want data for a single country or for a collection of countries, and from the Select up to 25 Years pick list you can specify that you want data for more than a single year. See the instructions in small text at the bottom of the window on how to select multiple entries from the selection boxes.
From the Select Region(s) selection box, you can choose from pre-selected groupings of countries.
Now, choose a single country under Country Search and select two or more years from the Select up to 25 Years pick list.
Then click SUBMIT.
You will see a summary table or plot of the data for your selected country and years.
Click the Search button to go back and experiment with the choices in the Select Region(s) selection box and the Aggregation Options choice list.

For your information: to download an Excel (.xls) or an comma-delimited text file (.csv) version of the data, find the respective link on the Results page: "Excel" or "CSV"
Download links may not appear when the search has been broad.

6. Counts, Rates, and Densities

The raw data collected during decennial censuses are counts--whole numbers that represent people and housing units. The Census Bureau aggregates counts to geographic areas such as counties, tracts, block groups, and blocks, and reports the aggregate totals. In other cases, summary measures, such as averages and medians, are reported. Counts can be used to ensure that redistricting plans comply with the constitutional requirement that each district contain equal population. Districts are drawn larger in sparsely populated areas, and smaller where population is concentrated. Counts, averages, and medians cannot be used to determine that districts are drawn so that minority groups have an equal probability of representation, however. For this, pairs of counts must be converted into rates or densities. A rate, such as Hispanic population as a percentage of total population, is produced by dividing one count by another. A density, such as persons per square kilometer, is a count divided by the area of the geographic unit to which the count was aggregated. In this chapter, we'll consider how the differences between counts, rates, and densities influence the ways in which the data may be processed in geographic information systems and displayed on thematic maps.

7. Attribute Measurement Scales

Chapter 2 focused upon measurement scales for spatial data, including map scale (expressed as a representative fraction), coordinate grids, and map projections (methods for transforming three dimensional to two dimensional measurement scales). You may know that the meter, the length standard established for the international metric system, was originally defined as one-ten-millionth of the distance from the equator to the North Pole. In virtually every country except the United States, the metric system has benefited science and commerce by replacing fractions with decimals, and by introducing an Earth-based standard of measurement.

Standardized scales are needed to measure non-spatial attributes as well as spatial features. Unlike positions and distances, however, attributes of locations on the Earth's surface are often not amenable to absolute measurement. In a 1946 article in Science, a psychologist named S. S. Stevens outlined a system of four levels of measurement meant to enable social scientists to systematically measure and analyze phenomena that cannot simply be counted. (In 1997, geographer Nicholas Chrisman pointed out that a total of nine levels of measurement are needed to account for the variety of geographic data.) The levels are important to specialists in geographic information because they provide guidance about the proper use of different statistical, analytical, and cartographic operations. In the following, we consider examples of Stevens' original four levels of measurement: nominal, ordinal, interval, and ratio.

8. Nominal Level

Data produced by assigning observations into unranked categories are said to be nominal level measurements. Nominal categories can be differentiated and grouped into categories, but cannot logically be ranked from high to low (unless they are associated with preferences or other exogenous value systems). For example, one can classify the land cover at a certain location as woods, scrub, orchard, vineyard, or mangrove. One cannot say, however, that a location classified as "woods" is twice as vegetated as another location classified "scrub." The phenomenon "vegetation" is a set of categories, not range of numerical values, and the categories are not ranked. That is, "woods" is in no way greater than "mangrove," unless the measurement is supplemented by a preference or priority.

Selected vegetation categories shown as the patterns depicted on USGS topographic maps (woods, scrub, orchard, vineyard, mangrove)

Figure 3.9.1 Attribute data measured at the nominal level: Selected vegetation categories depicted on USGS topographic maps.

Credit: Steger, 1986

Although census data originate as counts, much of what is counted is individuals' membership in nominal categories. Race, ethnicity, marital status, mode of transportation to work (car, bus, subway, railroad...), type of heating fuel (gas, fuel oil, coal, electricity...), all are measured as numbers of observations assigned to unranked categories. For example, the map below in Figure 3.9.2, which appears in the Census Bureau's first atlas of the 2000 census, highlights the minority groups with the largest percentage of population in each U.S. state. Colors were chosen to differentiate the groups, but not to imply any quantitative ordering.

Western states have Hispanics as highest percent minority group, while eastern states have Blacks.

Figure 3.9.2 Minority groups with highest percent population for each state

Credit: Brewer & Suchan, 2001

9. Ordinal Level

Like the nominal level of measurement, ordinal scaling assigns observations to discrete categories. Ordinal categories are ranked, however. It was stated in the preceding page that nominal categories such as "woods" and "mangrove" do not take precedence over one another unless an extrinsic set of priorities is imposed upon them. In fact, the act of prioritizing nominal categories transforms nominal level measurements to the ordinal level.

Different dotted lines representing how different boundaries are depicted on USGS topographic maps ie. national, state, county, parks, etc

Figure 3.10.1 Attribute data measured at the ordinal level: Ranked categories of boundaries depicted on USGS topographic maps.

Credit: Steger, 1986

Examples of ordinal data often seen on reference maps include political boundaries that are classified hierarchically (national, state, county, etc.) and transportation routes (primary highway, secondary highway, light-duty road, unimproved road). Ordinal data measured by the Census Bureau include how well individuals speak English (very well, well, not well, not at all), and level of educational attainment. Social surveys of preferences and perceptions are also usually scaled ordinally.

Individual observations measured at the ordinal level typically should not be added, subtracted, multiplied, or divided. For example, suppose two 640-acre grid cells within your county are being evaluated as potential sites for a hazardous waste dump. Say the two areas are evaluated on three suitability criteria, each ranked on a 0 to 3 ordinal scale, such that 0 = unsuitable, 1 = marginally unsuitable, 2 = marginally suitable, and 3 = suitable. Now say Area A is ranked 0, 3, and 3 on the three criteria, while Area B is ranked 2, 2, and 2. If the Siting Commission was to simply add the three criteria, the two areas would seem equally suitable (0 + 3 + 3 = 6 = 2 + 2 + 2), even though a ranking of 0 on one criterion ought to disqualify Area A.

10. Interval and Ratio Levels

Interval and ratio are the two highest levels of measurement in Stevens' original system. Unlike nominal- and ordinal-level data, which are qualitative in nature, interval- and ratio-level data are quantitative. Examples of interval level data include temperature and year. Examples of ratio level data include distance and area (e.g., acreage). The scales are similar in so far as units of measurement are arbitrary (Celsius versus Fahrenheit, Gregorian versus Islamic calendar, English versus metric units). The scales differ in that the zero point is arbitrary on interval scales, but not on ratio scales. For instance, zero degrees Fahrenheit and zero degrees Celsius are different temperatures, and neither indicates the absence of temperature. Zero meters and zero feet mean exactly the same thing, however. An implication of this difference is that a quantity of 20 measured at the ratio scale is twice the value of 10, a relation that does not hold true for quantities measured at the interval level (20 degrees is not twice as warm as 10 degrees).

Because interval and ratio level data represent positions along continuous number lines, rather than members of discrete categories, they are also amenable to analysis using inferential statistical techniques. Correlation and regression, for example, are commonly used to evaluate relationships between two or more data variables. Such techniques enable analysts to infer not only the form of a relationship between two quantitative data sets, but also the strength of the relationship.

11. Levels and Operations

One reason that it's important to recognize levels of measurement is that different measurement scales are amenable to different analytical operations (Chrisman 2002). Some of the most common operations include:

Group: Categories of nominal and ordinal data can be grouped into fewer categories. For instance, grouping can be used to reduce the number of land use/land cover classes from, say, four (residential, commercial, industrial, parks) to one (urban).
Isolate: One or more categories of nominal, ordinal, interval, or ratio data can be selected, and others set aside. As a hypothetical example, consider a range of georeferenced soil moisture readings taken over a farm field. A subrange of readings that are amenable to a particular fertilizer or pesticide might be isolated so that application is limited to the appropriate areas of the field.
Cross tab: Two or more sets of nominal or ordinal categories can be associated one to another in pairs, triplets, etc. Chrisman (2002) points to the multicharacter codes used in the National Wetland Inventory as an example of a cross tab. Each position in the NWI code represents a particular attribute. Each unique code, therefore, represents a cross tabulation of the possible combinations of attributes.
Difference: The difference of two interval level observations (such as two calendar years) results in one ratio level observation (such as one age).
Other arithmetic operations: Two or more compatible sets of ratio or interval level data can be added, subtracted, multiplied, or divided. For example, the per capita (average) income of a census tract can be calculated by dividing the sum of the income of every individual in a census tract (a ratio level variable) by the sum of persons residing in the tract (a second ratio level variable).
Classification: Interval and ratio data are frequently sorted into ordinal level categories for thematic mapping.

12. Thematic Mapping

Unlike reference maps, thematic maps are usually made with a single purpose in mind. Typically, that purpose has to do with revealing the spatial distribution of one or two attribute data sets.

In this section, we will consider distinctions among three types of ratio level data, counts, rates, and densities. We will also explore several different types of thematic maps, and consider which type of map is conventionally used to represent the different types of data. We will focus on what is perhaps the most prevalent type of thematic map, the choropleth map. Choropleth maps tend to display ratio level data which have been transformed into ordinal level classes. Finally, you will learn two common data classification procedures, quantiles and equal intervals.

13. Graphic Variables

Maps use graphic symbols to represent the locations and attributes of phenomena distributed across the Earth's surface. Variations in symbol size, color lightness, color hue, and shape can be used to represent quantitative and qualitative variations in attribute data. By convention, each of these "graphic variables" is used to represent a particular type of attribute data.

14. Counts, Rates, and Densities

Ratio level data predominate on thematic maps. Ratio data are of several different kinds, including counts, rates, and densities. As stated earlier, counts (such as total population) are whole numbers representing discrete entities, such as people. Rates and densities are produced from pairs of counts. A rate, such as percent population change, is produced by dividing one count (for example, population in year 2) by another (population in year 1). A density, such as persons per square kilometer, is a count divided by the area of the geographic unit to which the count was aggregated (e.g., total population divided by number of square kilometers). It is conventional to use different types of thematic maps to depict each type of ratio-level data.

15. Mapping Counts

The simplest thematic mapping technique for count data is to show one symbol for every individual counted. If the location of every individual is known, this method often works fine. If not, the solution is not as simple as it seems. Unfortunately, individual locations are often unknown, or they may be confidential. Software like ESRI's ArcMap, for example, is happy to overlook this shortcoming. Its "Dot Density" option causes point symbols to be positioned randomly within the geographic areas in which the counts were conducted. The size of dots and the number of individuals represented by each dot are also optional. Random dot placement may be acceptable if the scale of the map is small so that the areas in which the dots are placed are small. Often, however, this is not the case.