Maps are both the raw material and the product of geographic information systems (GIS). All maps represent features and characteristics of locations, and that representation depends upon data relevant at a particular time. All maps are also selective; they do not show us everything about the place depicted; they show only the particular features and characteristics that their maker decided to include. Maps are often categorized into reference or thematic maps based upon the producer’s decision about what to include and the expectations about how the map will be used. The prototypical reference map depicts the location of “things” that are usually visible in the world; examples include road maps and topographic maps (depicting terrain). The U.S. Geological Survey (USGS) website below (Figure 3.1) provides examples of the standard topographic map produced today along with other example reference maps and a wide range of other information (see: nationalmap.gov/ustopo).
Thematic maps, in contrast, typically depict “themes.” They generally are more abstract, involving more processing and interpretation of data, and often depict concepts that are not directly visible; examples include maps of income, health, climate, or ecological diversity. There is no clear-cut line between reference and thematic maps, but the categories are useful to recognize because they relate directly to how the maps are intended to be used and to decisions that their cartographers have made in the process of shrinking and abstracting aspects of the world to generate the map.
For example, with a highway map (another example of a typical reference map), we expect the cartographer to take great care in accurately depicting road location, since the map’s main purpose is to act as a reference to the road network. In contrast, on a thematic map of U.S. unemployment rates focused on those rates, the base information such as state boundaries can be quite abstract without impeding our ability to understand the map. In Mapping it Out: Expository cartography for the humanities and social sciences, Mark Monmonier proposed the U.S. visibility map (Figure 3.2), adjusting the areas and shape of each state in order to help map users see all states, especially smaller states such as Rhode Island.
A flat sheet of paper is an imperfect but useful analog for geographic space. Notwithstanding the intricacies of spherical coordinate systems and map projections (see chapter 2.3 and 2.4), it is a fairly straightforward matter to plot points that stand for locations on the globe. Representing the attributes of locations on maps is sometimes not so straightforward, however. Abstract graphic symbols must depict, with minimum ambiguity, the quantities and qualities of the locations they represent. Over more than a century, with particular attention to thematic maps, cartographers have adopted and tested map symbolization principles through which geographic data are transformed into useful information. These principles focus on how color, size, shape, and other components of map symbols are used to represent characteristics of the geographic data depicted. As one example, the map below (Figure 3.3) uses variations in color to represent geographic differences in population change over a decade in the U.S.
The map above makes it easy to see where the U.S. population changed, by county, from 1990 to 2000 as well as where there was little change. To gain a sense of the power of thematic maps in transforming data into information, we need only to compare the map above to a list of population change rates for the more than 3,000 counties of the U.S [1]. that it is based upon. The thematic map reveals geographic patterns that would be virtually impossible to recognize from the table.
Maps of people, like the one above, are just one example of the nearly infinite variety of thematic maps that can be generated from today’s geographic data. This chapter will introduce the “cartographic process” through which maps are generated and then examine thematic maps specifically through exploration of diverse examples and introduce the most common (and a few uncommon) thematic mapping methods and how to interpret them.
Students who successfully complete Chapter 3 should be able to:
Chapter lead author: Jennifer M. Smith.
See citation to the full text and its precursor below; portions drawn directly from that text.
Today, maps can be produced easily through a wide range of online tools by anyone with access to the Internet. Maps used in most activities (from urban planning, through geological exploration or environmental management, to trip planning and navigation), however, are still typically produced by professionals with expertise in mapping or in the phenomena being depicted on the maps. The academic and professional field that focuses on mapping is called “cartography.” Cartography has been defined by the International Cartographic Association as “the discipline dealing with the conception, production, dissemination and study of maps.” One useful conceptualization of cartography is as a process that links map makers, map users, the environment mapped, and the map itself. One characterization of this process is depicted in Figure 3.4 below.
The cartographic process is a cycle that begins with a real or imagined environment. As map makers collect data from the environment (through technology and/or remote sensing), they use their perception to detect patterns and subsequently prepare the data for map creation (i.e., they think about the data and its patterns as well as how to best visualize them on a map). Next, the map maker uses the data and attempts to signify it visually on a map (encoding), applying generalization, symbolization, and production methods that will (hopefully) lead to a depiction that can be interpreted by the map user in the way the map maker intended (its purpose). Next, the map user reads, analyzes, and interprets the map by decoding the symbols and recognizing patterns. Finally, users make decisions and take action based upon what they find in the map. Through their provision of a viewpoint on the world, maps influence our spatial behavior and spatial preferences and shape how we view the environment.
In the cartographic process as outlined above, the fundamental component in generating a map to depict the environment is itself a process – the process of map abstraction. This is the topic we discuss next.
Registered Penn State students should return now to the Chapter 3 folder in Canvas to take a self-assessment quiz about the Introduction.
You may take practice quizzes as many times as you wish. They are not scored and do not affect your grade in any way.
It has become possible to map the world on the head of a pin, or even a smaller space, as shown here: Art of Science: World on the Head of a Pin [2], but, most details get left out. Even to achieve a screen-sized map of the world on your computer, map abstraction is fundamental to representing entities in a legible manner. The process of map abstraction includes at least five major (interdependent) steps: (a) selection, (b) classification, (c) simplification, (d) exaggeration, and (e) symbolization (Muehrcke and Muehrcke, 1992).
Depending on a map’s purpose, cartographers (map makers) select what information to include and what information to leave out. As Phillip Muehrcke (an Emeritus Professor of Geography from the University of Wisconsin) details, the cartographer must answer four questions: Where? When? What? Why? As an example (Figure 3.5), a cartographer can create a map of San Diego (where) showing current (when) traffic patterns (what) so that an ambulance can take the fastest route to an emergency (why).
The map in Figure 3.5 shows how a cartographer selected specific highways to include along with a few other features; these other features include a very generalized representation of the terrain, a few major rivers and lakes, and an indication of the area included in each of several communities (in pastel colors). The objective is to help drivers pick efficient routes by depicting the highways and whether traffic is moving quickly (green) or stalled (red). Other information is kept to a minimum and visually pushed to the background; that extra information is included to provide context for the primary focus (the highways and traffic on them).
Classification is the grouping of things into categories, or classes. By grouping attributes into a few discernible classes, new visual patterns in the data can emerge and the map becomes more legible. In the example above, the highways are classified into those without traffic detectors (gray) and those with traffic detectors (in color) and furthermore, within the latter, into slow (red), intermediate (yellow), and fast (green) travel conditions. There are many kinds of data classification used on maps; we will focus specifically on classification of numerical map data in more detail later on in the chapter. As a preview of some of the things map readers must consider about classification, the example below shows one dataset for the rate of prostate cancer by county in Pennsylvania mapped using a different number of classes. As you can see, different patterns emerge depending upon how many classes the cartographer chooses to visualize. One must be critical when looking at maps because changing the map classification can change what appears to be true. In How to Lie With Maps, Mark Monmonier discusses how mapmakers intentionally and unintentionally lie through techniques such as map classification, among others.
Cartographers also need to simplify the features on a map beyond the tasks of feature type selection and feature classification in order to make a map more intelligible. This includes choosing to delete, smooth, typify, and aggregate entities within feature types. In the process of deleting entities, imagine creating a map of cities for the United States. As illustrated in Figure 3.7, attempting to include every city in the U.S. would render the map illegible. Map makers must delete, for instance, cities below a certain population (as done in the map on the right) in order to better serve the purpose of the map. In this case, if the purpose was to show the most populous cities, a fixed population threshold produces a very appropriate result. If, however, the purpose was to show the most important cities in the region, then an arbitrary population threshold does not work since, for example, Salt Lake City is just as important to Utah as Phoenix is to Arizona.
Smoothing is the act of eliminating unnecessary elements in the geometry of features, such as the superfluous details of a nation’s shoreline that can only be seen at a larger, zoomed-in regional scale. Typification depicts just the most typical components of the mapped feature. The visibility map above is a good example of typification in which the actual geographic shape of state boundaries is replaced with what might be considered a caricature that retains only key aspects of each state’s shape. Going beyond the simplification processes that act on one feature at a time, aggregation combines multiple features into one. Imagine a river composed of numerous meandering streams at a large scale (i.e., zoomed in), but when moving to a smaller scale (i.e., zooming out), the streams are merged into one larger river as it becomes impossible to maintain the detail. If you visit Google Maps [4] and zoom in to Harrisburg, Pennsylvania, you will find the Susquehanna River flowing through the middle of the capital. As you zoom out to a smaller scale, you will view the various smaller streams of the Susquehanna begin to collapse into a single blue line as the details of the river aggregate.
The purpose of this practice activity is to show you a visual example of simplification and smoothing of geographic features in the online MapShaper application.
I encourage you to experiment with the various methods and settings to see how simplification eliminates unnecessary elements as you move through different map scales.
Deliberate exaggeration of map features is often performed in order to allow certain features to be seen. For instance, on a standard paper highway map of Pennsylvania (the fold-up kind you might have in the glove box of your car, thus about 3 feet across when unfolded), interstate highways are printed at roughly 0.035 inches in width. That sounds pretty small, right? But, if the width of the printed road relative to the map width was the same as the width of the actual highway relative to the width of Pennsylvania, it would mean that the the Interstate was nearly 2000 feet wide! This is a typical case of exaggeration to create an abstraction that is useful for travel.
In the final process of creating a map, the cartographer symbolizes the selected features on a map. These features can be symbolized in visually realistic ways, such as a river depicted by a winding blue line. But many depictions are much more abstract, such as a circle or star representing a city. Map symbols are constructed from more primitive “graphic variables, the elements that make up symbols. Below, we provide a brief overview of these core graphic variables; then we focus on how color in particular is used (or should be used).
Given the large variety of maps that exist, it might be surprising to learn that the visual appearance of all maps starts from a very small set of display primitives from which all those variations can be constructed. We call these primitives graphic variables because each represents a “graphic” (visible) feature of a map symbol that can be “varied.” While different cartographers have identified a slightly different set of primitives, most agree that there are somewhere between 7 and 12 of them from which all maps symbolization can be constructed. The most commonly cited primitives that can be varied for map symbols are: location, size, shape, orientation, texture, and three components of color – color hue (red, green, blue, etc.), color lightness (how light or dark the color is), color saturation (how pure the color hue is). By convention, each of these "graphic variables" is used to represent particular categories of data variation.
As you can see above, three of the graphic variables are components of color. Color is particularly important for map symbolization today since so many maps are seen online where color is always available and nearly always used. While most maps you will see use color to depict data (as well as in aesthetic ways), many maps do not use color in the most logical ways in relation to the data being depicted. Well designed maps use variations in the three color variables in ways that reflect the kinds of variations in the underlying data they represent. Below, we provide a few simple guidelines that will allow you to recognize maps that use color in logical as well as illogical ways. Recognizing the latter is particularly important so that you are not misled by maps you encounter.
To help cartographers (and others) select good colors for maps, Dr. Cynthia Brewer and Dr. Mark Harrower developed Color Brewer (ColorBrewer2.org [6]), a web app designed to help users pick colors based on data type, number of data classes, and mode of map presentation (i.e., printing, photocopying). The color schemes have been tested with users who have color deficiency (about 8% of the population; difficulty distinguishing red from green is the most common). The web app allows users to interact with a map template by changing colors, background, borders, and terrain. There are three main color scheme forms a user can choose from: sequential, diverging, and categorical. Each is appropriate for specific kinds of data as detailed below.
Sequential color schemes should be employed when data is arranged from a low to a high data value (e.g., data for mean annual income by county in Pennsylvania). This sequential scheme aligns colors from light (depicting low data values) to dark (depicting high data values) in a step-wise sequence. Sequential schemes can rely on only color lightness as shown below (Figure 3.9) at left or may add some color hue variation to enhance differences in categories will retaining the clear visual ordering as shown at right. As an example, Figure 3.10 uses a 4-class purple sequential scheme to depict Avian Influenza, with a focus on Eurasia.
Diverging color schemes highlight an important midrange or critical value of ordered data as well as the maximum and minimum data values. Two contrasting dark hues converge in color lightness at the critical value. This is the scheme used for the population change map in Figure 3.3 above in which the critical dividing point is zero change.
Unlike the ordered data mentioned in the previous color schemes, qualitative color schemes are used to present categorical data, or data belonging to different categories. Different hues visually separate each of the different classes, or categories. The map in Figure 3.13 employs a qualitative color scheme of three different colors (red, blue, green) to represent different categories (coke, pop, and soda respectively).
Registered Penn State students should return now to the Chapter 3 folder in Canvas to take a self-assessment quiz about Map Abstraction.
You may take practice quizzes as many times as you wish. They are not scored and do not affect your grade in any way.
As introduced above, unlike reference maps, thematic maps are usually made with a single purpose in mind. Often, that purpose has to do with revealing the spatial distribution of one or two attribute data sets (e.g., to help readers understand changing U.S. demographics as with the population change map). Alternatively, thematic maps can have a decision-making purpose (e.g., to help users make travel decisions as with the real-time traffic map).
In the rest of this chapter, we will explore different types of thematic maps and consider which type of map is conventionally used for different types of data and different use goals. A primary distinction here is between maps that depict categorical (qualitative) data and those that depict numerical (quantitative) data.
As mentioned in the section on color schemes, categorical data are data that can be assigned to distinct non-numerical categories. For example, the category of a beach could not be described as two times the value of a wetland; it is different in kind rather than amount. In mapping categorical data, cartographers often focus on displaying the different categories or classes through shape or color hue. The CrimeViz map application (CrimeViz [9]) developed in the GeoVISTA Center at Penn State visualizes violent crimes reported from the District of Columbia Data Catalog (DC Data Catalog [10]). Every crime location is displayed as a circular point, where each crime category is differentiated through hue (arson: orange, homicide: purple, sexual abuse: blue). This interactive map application allows map users to explore and find new patterns across space and time.
Aside from altering color to represent different categories on a map, changing the shape of a point symbol can help map users differentiate different groups. The Ushahidi (signifying “testimony” in Swahili) website [11] developed an online crowd sourcing map application [12]. Following the election in 2008, many Kenyans believed the new president manipulated votes in his favor, which led to violence throughout the country. Users of the Ushahidi website were prompted to report acts of violence in Kenya. Their map, automatically generated from the reports, displays different types of incidents by varying the shape of the point feature (fire: all categories, push pin: specific type of violence, dove: peace efforts, people: displaced people). In addition, each subcategory of violence (represented by push pins) is contrasted by differing hues (blue: riots, orange: deaths, and so on). The tools to create this mapping application have been distributed for free around the world and are now used for a wide array of crisis mapping applications. One recent example is their application to generate maps of sexual violence in Syria (Women Under Siege: Syria Crowdmap [13]); and for those who read Japanese, the tools were applied to the Japan Earthquake and subsequent nuclear disaster [14].
Categorical aspects of linear features can also be visualized on a map. In the figure below, different gas pipelines owned by various companies are depicted in different color hues. The dashed pink line in the top left of the figure represents a proposed gas line from Alaska that could send up to 4.5 billion cubic feet of natural gas a day to the conterminous United States. In this map, the cartographer uses the process of map abstraction for the purpose of displaying the current and proposed gas pipeline network. First, only necessary features (pipelines, territories and major cities) are selected for display in order to produce a clean and legible map. Next, the linear pipeline network is classified into several groups based upon distinct companies. The map is simplified by visualizing only major cities important to the gas pipeline network. The width of the pipeline is constant across the entire system, exaggerating the actual width (if the width of lines represented real-world diameter of the pipes proportionally, the real pipes would be 16 miles across). Finally, the classified/categorical data (the different pipeline companies) is symbolized by different color hues to represent the qualitative difference among the categories.
The maps above focus on depiction of specific discrete entities, things that have a label we use when discussing them. Categorical maps can also represent characteristics of extended areas or territories. In this case, rather than categorizing discrete entities, we categorize the characteristics of the place, and those places may or may not have precise boundaries. A prototypical example is a land use map in which all areas of the map fall into one of a set of distinct land use categories. The most common method to depict this kind of data is to fill the area with a color or a texture. Below is an example in which land use is depicted very abstractly. All places are assigned to one of only three categories: agriculture, forest, or developed.
Registered Penn State students should return now to the Chapter 3 folder in Canvas to take a self-assessment quiz about Mapping Categorical Data.
You may take practice quizzes as many times as you wish. They are not scored and do not affect your grade in any way.
When data are numerical, the mapping focus is typically on representing at least relative rank order among the entities depicted, with some maps trying to represent magnitudes in a direct way. A wide array of map types has been developed over the years to represent numerical data. Here, we will introduce some of the most common map types you are likely to encounter. There is a growing number of online tools that you can use to generate these common map types yourself.
We begin by introducing one of the most common thematic map types for numerical data, the choropleth map. This is followed by a brief discussion of the U.S. Census as an important source of numerical data that is depicted on choropleth thematic maps as well as on other thematic map types. We then introduce three important additional map types you are likely to encounter frequently: proportional symbol maps, dot maps, and cartograms.
Google collects certain search terms that users input because they are key indicators of flu among users. Visit Google Flu Trends [18] and explore current flu trends around the world that have been numerically classified from minimal to intense activity and mapped. Pick a country that has flu activity. Do you see any geographic patterns within the country? How does this year compare to the past?
Choropleth maps are among the most prevalent types of thematic maps. Choropleth maps represent quantitative data that is aggregated to areas (often called “enumeration units”). The units can be countries of the world, states of a country, school districts, or any other regional division that divides the whole territory into distinct areas. The term choropleth is derived from the Greek; khōra 'region' + plēthos 'multitude' (thus, be careful not to mix up “choro”, which has no ‘l’, with the “chloro” of chlorophyll or chlorine). Choropleth maps depict quantities aggregated to their regions by filling the entire region with a shade or color. Typically, the quantities are grouped into “classes” (representing a range in data value) and a different fill is used to depict each class (see section 3.2.6 for more on data classification). The goal of choropleth maps is to depict the geographic distribution of the data magnitudes; ideally the choice of fill will communicate the range from low data magnitudes to high magnitudes through an obvious change from light to dark as in Figure 3.18 below. Choropleth maps should use either a sequential color scheme (as below) or a diverging color scheme depending upon whether there is a meaningful break point in the data from which values diverge or the data simply range from low to high (see section 2.1.5.2 above).
To generate eye-catching maps with easily distinguishable data classes, choropleth maps often combine color hue differences with a change in color lightness (as with the yellow, through orange, to dark red scheme depicted in Figure 3.18 above). But many maps get produced without following that cartographic rule, leading to some very colorful but misleading maps as shown in the pair below.
Choropleth maps are most appropriate for representing derived quantities, as represented in Figure 3.18 above. Derived quantities relate a data value to some reference value. Examples include density, average, rate, and percent. A density is a count divided by the area of the geographic unit to which the count was aggregated (e.g., the total population divided by the number of square kilometers to produce population/square mile, as in Figure 3.18). An average is a measure of central tendency, specifically the mean value calculated as a total amount divided by the number of entities producing the amount (e.g., the average income for a county calculated by totaling the income of all people in the country and dividing by the number of people). A rate is a quantity that tells us how frequently something occurs, a value compared to a standard value (e.g., Bradford County, PA had a rate of 45.1/100,000 deaths due to colorectal cancer among women over the period of 1994-2002). A percent is the proportion of a total (and can range from 0-100%). While choropleth maps are best for these derived quantities, you will also encounter choropleth maps used for counts (e.g., the number of crimes committed, votes cast in an election, etc.). When you do, it is important to read the map with caution because big regions are likely to have high totals just because they are big.
Some of the richest sources of attribute data for thematic mapping, particularly for choropleth maps, are national censuses. In the United States, a periodic count of the entire population is required by the U.S. Constitution. Article 1, Section 2, ratified in 1787, states (in the last paragraph of the section shown below) that “Representatives and direct taxes shall be apportioned among the several states which may be included within this union, according to their respective numbers ... The actual Enumeration shall be made [every] ten years, in such manner as [the Congress] shall by law direct." The U.S. Census Bureau is the government agency charged with carrying out the decennial census.
The results of the U.S. decennial census determine states' portions of the 435 total seats in the U.S. House of Representatives. The thematic map below (Figure 3.22) shows states that lost and gained seats as a result of the reapportionment that followed the 2000 census. This map, focused on the U.S. by state, is a variant on a choropleth map. Rather than using color fill to depict quantity, color depicts only change and its direction, red for a loss in number of Congressional seats, gray for no change, and blue for a gain in number of Congressional seats. Numbers are then used as symbols to indicate amount of change (small -1 or +1 for a change of 1 seat and larger -2 or +2 for a change of two seats). This scaling of numbers is an example of the more general application of “size” as a graphic variable to produce “proportional symbols” – the topic we cover in detail in the section on proportional symbol mapping below.
Congressional voting district boundaries must be redrawn within the states that gained and lost seats, a process called redistricting. Constitutional rules and legal precedents require that voting districts contain equal populations (within about 1 percent). In addition, districts must be drawn so as to provide equal opportunities for representation of racial and ethnic groups that have been discriminated against in the past. Further, each state is allowed to create its own parameters for meeting the equal opportunities constraint. In Pennsylvania (and other states), geographic compactness has been used as one of several factors. Article II, Section 16 of the Pennsylvania Constitution says:
§ 16. Legislative districts.
The Commonwealth shall be divided into 50 senatorial and 203 representative districts, which shall be composed of compact and contiguous territory as nearly equal in population as practicable. Each senatorial district shall elect one Senator, and each representative district one Representative. Unless absolutely necessary no county, city, incorporated town, borough, township or ward shall be divided in forming either a senatorial or representative district. (Apr. 23, 1968, P.L.App.3, Prop. No.1). Source: http://www.legis.state.pa.us/WU01/LI/LI/CT/HTM/00/00.002..HTM [21]
Whether districts determined each decade actually meet these guidelines is typically a contentious issue and often results in legal challenges. Below, the Congressional District map for PA that defines the boundaries of districts for the 112th Congress illustrates how irregular districts can be. District 12 has a particularly interesting shape.
Beyond the role of the census of population in determining the number of representatives per state (thus in providing the data input to reapportionment and redistricting), the Census Bureau's mandate is to provide the population data needed to support governmental operations, more broadly including decisions on allocation of federal expenditures. Its broader mission includes being "the preeminent collector and provider of timely, relevant, and quality data about the people and economy of the United States". To fulfill this mission, the Census Bureau needs to count more than just numbers of people, and it does. We will discuss this in more detail later (in section 3.3, Thinking about aggregated data: Enumeration versus samples).
Besides reapportionment and redistricting, U.S. Census counts also affect the flow of billions of dollars of federal expenditures, including contracts and federal aid, to states and municipalities. In 2011, for example, some $486 billion of Medicaid funds were distributed according to a formula that compared state and national per capita income. $93 billion worth of highway planning and construction funds were allotted to states according to their shares of urban and rural population. And $120 billion of Unemployment Compensation was distributed from the Federal level. The thematic maps below (using historical data from 1995) illustrate the strong relationship between population counts and the distribution of federal tax dollars using proportional symbols (symbols in which the graphic variable of size is used to depict data magnitude).
There are two types of point features that are typically depicted with proportional symbols: features for which the data represents a geographic position directly (e.g., gallons of oil from individual oil wells), and features that are geographic areas to which data are aggregated and the data magnitudes are assigned to a representative point within the area (e.g., the geographic centroid of a state as in the examples above). In either case, the area of the symbol is scaled to represent the data magnitude, sometimes with a bit of exaggeration to adjust for a general tendency of human vision to underestimate differences in area. A variant on this direct data-to-symbol scaling groups values into categories first, then scales the symbol to represent the mean for the category, assigning a symbol to each place to represent the category range that the mean for the place falls within (see Figure 3.25 below).
One important characteristic of proportional symbols is that they can easily be designed to represent more than one data value per location. Among the most common example is a “pie chart map” in which a circle is scaled proportionally to some total, and the size of wedges within the circle is scaled to depict a proportion of a total for two or more sub-categories. The map below uses circle size to depict population totals in each state, and the pie slices then depict the proportion of that total who identify as Hispanic compared to those who are non-Hispanic.
Registered Penn State students should return now to the Chapter 3 folder in Canvas to take a self-assessment quiz about Choropleth Mapping, Census Data, and Proportional Symbol Mapping.
You may take practice quizzes as many times as you wish. They are not scored and do not affect your grade in any way.
For data that represent an area, proportional symbols are a fairly extreme abstraction. They provide a very simple overview of data magnitudes geographically but hide any geographic variation that might occur inside the enumeration units to which the data are aggregated. An alternative is the dot map. Dot maps depict magnitude by frequency rather than the size of symbol and add the depiction of geographic distribution by use of the graphic variable of location. Specifically, dot maps assign one to many dots per enumeration area to represent a specific count in each area. The difference between a dot map and a simple map of point features is that each dot represents more than one entity and the locations are representative of the distribution rather than being exact locations. Specifically, dots that represent some count are placed within enumeration units to represent generally where the feature or attribute occurs.
In the example below, the dot map depicts the size of the Hispanic population by the number of dots per state. Each dot represents 100,000 people in this case, and the general geographic distribution of the Hispanic population within the state is signified by the position of the dots. Not surprisingly, dot maps can vary substantially in how well the distribution of dots on the map represents the actual distribution of the phenomena in the world. Cartographers typically use secondary sources of information to help them decide on the appropriate locations for the dots (e.g., land use maps, satellite images, or statistics collected for smaller geographic units like counties). But, the position of dots usually is based on an educated estimate of distribution rather than on any direct measurement of where the people (in this case) or automobiles or bushels of wheat (or the many other kinds of things we can count) actually are.
A cartogram can be considered a special case of proportional symbol mapping. But, in this case, the “symbol” that is scaled in proportion to a data magnitude is the geographic area for which data are aggregated. Cartograms are unusual enough that they attract viewer attention, making them a popular mapping method with the media, particularly during election years. Their primary weakness (in addition to distorting geography so that no standard measurements such as distance among places are accurate), is that they cannot be interpreted correctly unless the map reader knows the actual geographic shapes of the map units so that sizes can be related to the places they represent.
The map below shows the results of the 2008 Presidential election, with a red state signifying a majority of votes for John McCain, the republican candidate, and blue states a majority for Barack Obama, the democratic candidate. This cartogram scales the areas of each shape to represent its respective total population, visually showing how the majority of the United States voted.
The following maps illustrate the power that some cartograms can have in helping users visually comprehend a phenomenon. While the map on the left depicts the majority vote results by county (with a vast majority of counties for the Republican candidate), the cartogram on the right shows the areas again depicted by population (this time with the country rather than state level data), revealing the larger number of Democratic support. The map on the left gives a distorted view (even though it does not look distorted) because a majority of counties won by the Republican candidate were low in population and many were large in area.
For more election cartogram examples, visit University of Michigan 2008 election site [23].
Visit the National Geographic Earthpulse map [24]. On the left hand side, you will find numerous check boxes for different thematic maps. Choose two thematic maps and identify at least two cartographic techniques (any that have been discussed in the chapter) the cartographer used when creating this map. For instance, in the map above (Figure 3.30), the cartographer used a qualitative color scheme (blue and red) on a choropleth map to show different categories (democratic or republication majority vote) for each U.S. state.
As discussed above (and in Chapter 1), all maps are abstractions. This means that they depict only selected information, but also that the information selected must be generalized due to the limits of display resolution, comparable limits of human visual acuity, and especially the limits imposed by the costs of collecting and processing detailed data. What we have not previously considered is that generalization is not only necessary, it is sometimes beneficial; it can make complex information understandable.
Consider a simple example. The graph below (Figure 3.31) shows the percent of people who prefer the term “pop” (not soda or coke) for each state. Categories along the x axis of the graph represent each of the 50 unique percentage values (two of the states had exactly the same rate). Categories along the y axis are the numbers of states associated with each rate. As you can see, it's difficult to discern a pattern in these data; it appears that there is no pattern.
The following graph (Figure 3.32) shows exactly the same data set, only grouped into 10 classes with equal 10% ranges). It's much easier to discern patterns and outliers in the classified data than in the unclassified data. Notice that people in a large number of states (23) do not really prefer the term “pop” as they are distributed around 0 to 10 percent of users who favor that term. There are no states at the other extreme (91-100%), but a few states whose vast majority (81-90% of their population) prefer the term pop. Ignoring the many 0-10% states where pop is rarely used, the most common states are ones in which about 2/3 favor the term; looking back to Figure 3.13, these are primarily northern states, including Pennsylvania. All of these variations in the information are obscured in the unclassified data.
As shown above, data classification is a generalization process that can make data easier to interpret. Classification into a small number of ranges, however, gives up some details in exchange for the clearer picture, and there are multiple choices of methods to classify data for mapping. If a classification scheme is chosen and applied skillfully, it can help reveal patterns and anomalies that otherwise might be obscured (as shown above). By the same token, a poorly-chosen classification scheme may hide meaningful patterns. The appearance of a thematic map, and sometimes conclusions drawn from it, may vary substantially depending on the data classification scheme used. Thus, it is important to understand the choices that might be made, whether you are creating a map or interpreting one created by someone else.
Many different systematic classification schemes have been developed. Some produce mathematically "optimal" classes for unique data sets, maximizing the difference between classes and minimizing differences within classes. Since optimizing schemes produce unique solutions, however, they are not the best choice when several maps need to be compared. For this, data classification schemes that treat every data set alike are preferred.
Two commonly used classification schemes are quantiles and equal intervals. The following two graphs illustrate the differences.
The graph above groups the Pennsylvania county population change data into five classes, each of which contains the same number of counties (in this case, approximately 20 percent of the total in each). The quantiles scheme accomplishes this by varying the width, or range, of each class. Quantile is a general label for any grouping of rank ordered data into an equal number of entities; quantiles with specific numbers of groups go by their own unique labels ("quartiles" and "quintiles," for example, are instances of quantile classifications that group data into four and five classes respectively). The figure below, then, is an example of quintiles.
In the second graph, the data range of each class is equivalent (8.5 percentage points). Consequently, the number of counties in each equal interval class varies.
As you can see, the effect of the two different classification schemes on the appearance of the two choropleth maps above is dramatic. The quantiles scheme is often preferred because it prevents the clumping of observations into a few categories shown in the equal intervals map. Conversely, the equal interval map reveals two outlier counties that are obscured in the quantiles map. Due to the potentially extreme differences in visual appearance, it is often useful to compare the maps produced by several different map classifications. Patterns that persist through changes in classification schemes are likely to be more conclusive evidence than patterns that shift. Patterns that show up with only one scheme may be important, but require special scrutiny (and an understanding of how the scheme works) to evaluate.
Registered Penn State students should return now to the Chapter 3 folder in Canvas to take a self-assessment quiz about Dot Mapping, Cartograms, and Data Classification.
You may take practice quizzes as many times as you wish. They are not scored and do not affect your grade in any way.
Quantitative data of the kinds depicted by the maps detailed in the previous section come from a diverse array of sources. In the U.S., one of the most important sources is the U.S. Bureau of the Census (discussed briefly above). Here we focus in on one important distinction in data collected by the Census and by other organizations, a distinction between complete enumeration (counting every entity) and sampling.
Sixteen U.S. Marshals and 650 assistants conducted the first U.S. census in 1791. They counted some 3.9 million individuals, although as then-Secretary of State Thomas Jefferson reported to President George Washington, the official number understated the actual population by at least 2.5 percent (Roberts, 1994). By 1960, when the U.S. population had reached 179 million, it was no longer practical to have a census taker visit every household. The Census Bureau then began to distribute questionnaires by mail. Of the 116 million households to which questionnaires were sent in 2000, 72 percent responded by mail. A mostly-temporary staff of over 800,000 was needed to visit the remaining households, and to produce the final count of 281,421,906. Using statistically reliable estimates produced from exhaustive follow-up surveys, the Bureau's permanent staff determined that the final count was accurate to within 1.6 percent of the actual number (although the count was less accurate for young and minority residences than it was for older and white residents). It was the largest and most accurate census to that time. (Interestingly, Congress insists that the original enumeration or "head count" be used as the official population count, even though the estimate calculated from samples by Census Bureau statisticians is demonstrably more accurate.) As of this writing, some aspects of reporting from the decennial census of 2010 are still underway. Like 2000, the mail-in response rate was 72 percent. The official 2010 census count, by state, was delivered to the U.S. Congress on December 21, 2010 (10 days prior to the mandated deadline). The total count for the U.S. was 308,745,538, a 9.7% increase over 2000.
In the first census, in 1791, census takers asked relatively few questions. They wanted to know the numbers of free persons, slaves, and free males over age 16, as well as the sex and race of each individual. (You can view replicas of historical census survey forms at Ancestry.com [26]) As the U.S. population has grown, and as its economy and government have expanded, the amount and variety of data collected has expanded accordingly. In the 2000 census, all 116 million U.S. households were asked six population questions (names, telephone numbers, sex, age and date of birth, Hispanic origin, and race), and one housing question (whether the residence is owned or rented). In addition, a statistical sample of one in six households received a "long form" that asked 46 more questions, including detailed housing characteristics, expenses, citizenship, military service, health problems, employment status, place of work, commuting, and income. From the sampled data the Census Bureau produced estimated data on all these variables for the entire population.
In the parlance of the Census Bureau, data associated with questions asked of all households are called 100% data and data estimated from samples are called sample data. Both types of data are aggregated by various enumeration areas, including census block, block group, tract, place, county, and state (see the illustration below). Through 2000, the Census Bureau distributes the 100% data in a package called the "Summary File 1" (SF1) and the sample data as "Summary File 3" (SF3). In 2005, the Bureau launched a new project called American Community Survey that surveys a representative sample of households on an ongoing basis. Every month, one household out of every 480 in each county or equivalent area receives a survey similar to the old "long form." Annual or semi-annual estimates produced from American Community Survey samples replaced the SF3 data product in 2010.
To protect respondents' confidentiality, as well as to make the data most useful to legislators, the Census Bureau aggregates the data it collects from household surveys to several different types of geographic areas. SF1 data, for instance, are reported at the block or tract level. There were about 8.5 million census blocks in 2000. By definition, census blocks are bounded on all sides by streets, streams, or political boundaries. Census tracts are larger areas that have between 2,500 and 8,000 residents. When first delineated, tracts were relatively homogeneous with respect to population characteristics, economic status, and living conditions. A typical census tract consists of about five or six sub-areas called block groups. As the name implies, block groups are composed of several census blocks. American Community Survey estimates, like the SF3 data that preceded them, are reported at the block group level or higher. Figure 3.38 details the many geographic unit types that are used to organize data and how they relate. The unit types down the center of the diagram nest, with each higher type composed of some number of the lower type as outlined above for blocks, block groups, and census tracts.
The purpose of this practice activity is to guide you through the process of finding and acquiring 2000 census data from the U.S. Census Bureau data via the Web. Your objective is to look up the total population of each county in your home state (or an adopted state of the U.S.).
I encourage you to experiment some with the American FactFinder site. Start slow, and just click the BACK TO SEARCH button, un-check the TOTAL POPULATION dataset and choose a different dataset to investigate. Registered students will need to answer a couple of quiz questions based on using this site.
Pay attention to what is in the Your Selections window. You can easily remove entries by clicking the red circle with the white X.
On the SEARCH page, with nothing in the Your Selections box, you might try typing “QT” or “GCT” in the Narrow your search: slot. QT stands for Quick Tables, which are preformatted tables that show several related themes for one or more geographic areas. GCT stands for Geographic Comparison Tables, which are the most convenient way to compare data collected for all the counties, places, or congressional districts in a state, or all the census tracts in a county.
Below you will find several thematic maps produced by graduate students or faculty in the Department of Geography at Penn State to provide an idea of the variety that exists. Thematic maps cover a virtually unlimited range of topics and goals since they can depict any “theme” that varies from place to place. Thus, the examples below and the ones to follow in the rest of the chapter provide just a hint of what is possible.
In the map below, size, or height of each column, is the key graphic variable used to represent the total number of international passenger arrivals at each airport in Canada and the United States. This is a very direct representation similar to thinking about piling up a stack of pennies, with one for every airline passenger.
Find a thematic map online and identify both the theme and purpose of the map.
Registered Penn State students should return now to the Chapter 3 folder in Canvas to take a self-assessment quiz about Aggregated Data.
You may take practice quizzes as many times as you wish. They are not scored and do not affect your grade in any way.
This chapter has introduced a set of key concepts that underlie how maps represent data, specifically thematic maps. Mapping as a method to represent data has been related to cartography as a professional field focused on developing the science that supports effective mapping and the practice and technology of generating maps. Emphasis has been on the many different map types, how they are created, and what types of data are best suited for each map type. The core building blocks used to create map symbolization have also been introduced.
To summarize this chapter, maps are abstractions of the real world created through systematic selection, classification, simplification, exaggeration, and symbolization. They are products of the cartographic process, a cyclic process involving data as inputs, the final map as an output, as well as the map maker and map user as the creator and consumer of the map. Thematic maps help reveal geographic patterns that are hard or impossible to find in lists of numbers that data are typically presented in. Graphic variables and color schemes form the building blocks for thematic map construction. Furthermore, different types of thematic maps are used to represent different types of data, both categorical and numerical. Count data, for instance, are conventionally portrayed with symbols that are distinct from the statistical areas they represent, because counts are independent of the sizes of those areas. Rates and densities, on the other hand, are often portrayed as choropleth maps, in which the statistical areas themselves serve as symbols whose color lightness is varied with the magnitude of the attribute data they represent. Attribute data shown on choropleth maps are usually classified. Classification schemes that facilitate comparison of map series, such as the quantiles and equal intervals schemes demonstrated in this lesson, are the most common.
Surf the Internet and find a good map, a bad map, and an ugly map. What attributes are the important ones for assigning each map to the category you put it in?
Aggregation: The process of combining multiple features into one.
Average: A measure of central tendency, specifically the mean value calculated as a total amount divided by the number of entities producing the amount.
Cartographic Process: A cyclic process linking data from the environment as inputs, the final map as an output, as well as the map maker and map user as the creator and consumer of the map.
Cartography: The academic and professional field focused on mapping.
Choropleth Map: A map that depicts quantities aggregated to their regions (often called “enumeration units”) by filling the entire region with a shade or color.
Count: Whole numbers that represent the individual data such as people or housing units.
Delete: Systematically removing data to better serve the purpose of the map such as map legibility.
Density: A count divided by the area of the geographic unit to which the count was aggregated.
Dot Map: Maps that depict magnitude by frequency rather than size of symbol and add the depiction of geographic distribution by use of the graphic variable of location. Specifically, dot maps assign one to many dots per enumeration area to represent a specific count in each area.
Enumeration Areas: Areas or regions in which quantitative data is aggregated to (e.g., census tracts, counties, states, etc.).
Equal Interval: A data classification scheme that divides the data into equal sections (intervals).
Graphic Variables: Primitives in which map symbols are constructed. The core graphic variables include location, size, shape, orientation, texture, and three components of color – color hue (red, green, blue, etc), color lightness (how light or dark the color is), color saturation (how pure the color hue is).
Map Abstraction: The process of representing the real world in simplified form in order to generate a more legible map. It includes at least five major (interdependent) steps: (a) selection, (b) classification, (c) simplification, (d) exaggeration, and (e) symbolization.
Percent: The proportion of a total ranging from 0-100%.
Proportional Symbols: Symbols in which the graphic variable of size is used to depict data magnitude. There are two types of point features typically depicted: features where data represents a geographic position directly and features that are geographic areas to which data are aggregated and the data magnitudes are assigned to a representative point within the area.
Quantile: A general label for any grouping of rank ordered data into an equal number of entities; quantiles with specific numbers of groups go by their own unique labels ("quartiles" and "quintiles," for example, are instances of quantile classifications that group data into four and five classes respectively).
Rate: A quantity that tells us how frequently something occurs, where a value is compared to a standard value.
Reference Map: A map with a main purpose to act as a reference. The prototypical reference map depicts the location of “things” that are usually visible in the world.
Smoothing: The act of eliminating unnecessary elements in the geometry of features, such as the superfluous details of a nation’s shoreline that can only be seen at a larger, zoomed in regional scale.
Thematic Map: A map typically depicting “themes,” generally more abstract, involving more processing and interpretation of data and often representing concepts that are not directly visible; examples include maps of income, health, climate, or ecological diversity.
Typification: A depiction of the most typical components of the mapped feature.
Muehrcke, P. and Muehrcke, J.O. 1992: Map Use: Reading, Analysis, and Interpretation. 3rd edition. Madison, WI: JP Publications.
Slocum, T., McMaster, R., Kessler, F. and Howard, H.H. 2009: Thematic Cartography and Visualization. Upper Saddle River, NJ: Prentice Hall.
Brewer, C. & Suchan, T., (2001). Mapping census 2000: The geography of U. S. diversity. U. S. Census Bureau, Census Special Reports, Series CENSR/01-1. Washington, D. C.: U.S. Government Printing Office.
Chrisman, N. (2002). Exploring geographic information systems. (2nd ed.). New York: John Wiley & Sons, Inc.
Monmonier, M. (1995). Drawing the line: Tales of maps and cartocontroversy. New York: Henry Holt and Company.
Roberts, S. (1994). Who we are: A portrait of America based on the latest U.S. census. New York: Times Books.
U.S. Census Bureau (n. d.). Retrieved July 19, 1999, from http://www.census.gov [30]
U.S. Census Bureau (1996). Federal expenditures by state for fiscal year 1995. Retrieved May 9, 2006, from www.census.gov/prod/2/gov/fes95rv.pdf [22]
U.S. Census Bureau (2005). American FactFinder Retrieved July, 19, 1999, from http://factfinder.census.gov [31]
U.S. Census Bureau (n. d.). American FactFinder Retrieved August 2, 2012, from http://factfinder2.census.gov/faces/nav/jsf/pages/using_factfinder5.xhtml [28]
U.S. Census Bureau (2008). A Compass for understanding and using American Community Survey data: What general users need to know. U.S. Government Printing Office, Washington DC, 2008.
Links
[1] https://www.e-education.psu.edu/geog260/sites/www.e-education.psu.edu.geog260/files/file/us_pop_change.txt
[2] http://artofscience.wordpress.com/2009/12/29/the-world-on-the-head-of-a-pin/
[3] http://www.dot.ca.gov/dist11/d11tmc/sdmap/showmap.php
[4] http://www.maps.google.com
[5] http://www.mapshaper.org
[6] http://colorbrewer2.org/
[7] http://colorbrewer2.org
[8] http://www.popvssoda.com
[9] http://www.geovista.psu.edu/CrimeViz/
[10] http://data.octo.dc.gov/
[11] http://www.ushahidi.com
[12] http://legacy.ushahidi.com
[13] https://womenundersiegesyria.crowdmap.com/
[14] http://www.sinsai.info/
[15] http://legacy.ushahidi.com/index.asp
[16] http://www.arcticgas.gov/Moving-Alaska-gas-from-Canada-to-the-Lower-48
[17] http://www.wetlands.psu.edu/
[18] http://www.google.org/flutrends/
[19] http://www.archives.gov/exhibits/charters/charters_downloads.html
[20] http://www.naturalearthdata.com
[21] http://www.legis.state.pa.us/WU01/LI/LI/CT/HTM/00/00.002..HTM
[22] http://www.census.gov/prod/2/gov/fes95rv.pdf
[23] http://www-personal.umich.edu/%7Emejn/election/2008/
[24] http://earthpulse.nationalgeographic.com/earthpulse/earthpulse-map
[25] http://www.popvssoda.com/
[26] http://www.ancestry.com/
[27] http://factfinder.census.gov/jsp/saff/SAFFInfo.jsp?_pageId=gn7_maps
[28] http://factfinder2.census.gov/faces/nav/jsf/pages/using_factfinder5.xhtml
[29] http://www.census.gov
[30] http://www.census.gov/
[31] http://factfinder.census.gov/