3.2 Thematic Maps

As introduced above, unlike reference maps, thematic maps are usually made with a single purpose in mind. Often, that purpose has to do with revealing the spatial distribution of one or two attribute data sets (e.g., to help readers understand changing U.S. demographics as with the population change map). Alternatively, thematic maps can have a decision-making purpose (e.g., to help users make travel decisions as with the real-time traffic map).

In the rest of this chapter, we will explore different types of thematic maps and consider which type of map is conventionally used for different types of data and different use goals. A primary distinction here is between maps that depict categorical (qualitative) data and those that depict numerical (quantitative) data.

3.2.1 Mapping Categorical Data

As mentioned in the section on color schemes, categorical data are data that can be assigned to distinct non-numerical categories. For example, the category of a beach could not be described as two times the value of a wetland; it is different in kind rather than amount. In mapping categorical data, cartographers often focus on displaying the different categories or classes through shape or color hue. The CrimeViz map application (CrimeViz) developed in the GeoVISTA Center at Penn State visualizes violent crimes reported from the District of Columbia Data Catalog (DC Data Catalog). Every crime location is displayed as a circular point, where each crime category is differentiated through hue (arson: orange, homicide: purple, sexual abuse: blue). This interactive map application allows map users to explore and find new patterns across space and time.

Screenshot of features of CrimeViz: Map Panel, Data Layers Panel & Temporal Panel. Crime locations

Figure 3.14: Screenshot of the features of CrimeViz.

Credit: CrimeViz.

Aside from altering color to represent different categories on a map, changing the shape of a point symbol can help map users differentiate different groups. The Ushahidi (signifying “testimony” in Swahili) website developed an online crowd sourcing map application. Following the election in 2008, many Kenyans believed the new president manipulated votes in his favor, which led to violence throughout the country. Users of the Ushahidi website were prompted to report acts of violence in Kenya. Their map, automatically generated from the reports, displays different types of incidents by varying the shape of the point feature (fire: all categories, push pin: specific type of violence, dove: peace efforts, people: displaced people). In addition, each subcategory of violence (represented by push pins) is contrasted by differing hues (blue: riots, orange: deaths, and so on). The tools to create this mapping application have been distributed for free around the world and are now used for a wide array of crisis mapping applications. One recent example is their application to generate maps of sexual violence in Syria (Women Under Siege: Syria Crowdmap); and for those who read Japanese, the tools were applied to the Japan Earthquake and subsequent nuclear disaster.

Screenshot of the features of Ushahidi: shape of point symbol characterize data. More text above.

Figure 3.15: Screenshot of the features of Ushahidi.

Credit: Ushahidi.

Categorical aspects of linear features can also be visualized on a map. In the figure below, different gas pipelines owned by various companies are depicted in different color hues. The dashed pink line in the top left of the figure represents a proposed gas line from Alaska that could send up to 4.5 billion cubic feet of natural gas a day to the conterminous United States. In this map, the cartographer uses the process of map abstraction for the purpose of displaying the current and proposed gas pipeline network. First, only necessary features (pipelines, territories and major cities) are selected for display in order to produce a clean and legible map. Next, the linear pipeline network is classified into several groups based upon distinct companies. The map is simplified by visualizing only major cities important to the gas pipeline network. The width of the pipeline is constant across the entire system, exaggerating the actual width (if the width of lines represented real-world diameter of the pipes proportionally, the real pipes would be 16 miles across). Finally, the classified/categorical data (the different pipeline companies) is symbolized by different color hues to represent the qualitative difference among the categories.

3 Canadian lines to 10 US lines. More details in text above.

Figure 3.16: Map of the Gas Pipline Network From Canada to the United States.

Credit: Arcticgas.gov

The maps above focus on depiction of specific discrete entities, things that have a label we use when discussing them. Categorical maps can also represent characteristics of extended areas or territories. In this case, rather than categorizing discrete entities, we categorize the characteristics of the place, and those places may or may not have precise boundaries. A prototypical example is a land use map in which all areas of the map fall into one of a set of distinct land use categories. The most common method to depict this kind of data is to fill the area with a color or a texture. Below is an example in which land use is depicted very abstractly. All places are assigned to one of only three categories: agriculture, forest, or developed.

Map of land use in the Spring Creek Watershed in central Pennsylvania. Developed land (pink) surrounded by forest (green).

Figure 3.17: Map of land use in the Spring Creek Watershed in central Pennsylvania. In this map, “Developed” is a broad category that includes commercial, residential, and all other land uses that are not explicitly agriculture or forest.

Credit: This map was produced in Riparia, a Center in the Dept. of Geography at Penn State focused on wetlands and watershed management: (map provided by Dr. Robert Brooks).

Practice Quiz

Registered Penn State students should return now take the self-assessment quiz about Mapping Categorical Data.

You may take practice quizzes as many times as you wish. They are not scored and do not affect your grade in any way.

3.2.2 Mapping numerical data

When data are numerical, the mapping focus is typically on representing at least relative rank order among the entities depicted, with some maps trying to represent magnitudes in a direct way. A wide array of map types has been developed over the years to represent numerical data. Here, we will introduce some of the most common map types you are likely to encounter. There is a growing number of online tools that you can use to generate these common map types yourself.

We begin by introducing one of the most common thematic map types for numerical data, the choropleth map. This is followed by a brief discussion of the U.S. Census as an important source of numerical data that is depicted on choropleth thematic maps as well as on other thematic map types. We then introduce three important additional map types you are likely to encounter frequently: proportional symbol maps, dot maps, and cartograms.

Try This: Thematic Mapping of Flu Trends

Google collects certain search terms that users input because they are key indicators of flu among users. Visit Google Flu Trends and explore current flu trends around the world that have been numerically classified from minimal to intense activity and mapped. Pick a country that has flu activity. Do you see any geographic patterns within the country? How does this year compare to the past?

3.2.2.1 Choropleth mapping

Choropleth maps are among the most prevalent types of thematic maps. Choropleth maps represent quantitative data that is aggregated to areas (often called “enumeration units”). The units can be countries of the world, states of a country, school districts, or any other regional division that divides the whole territory into distinct areas. The term choropleth is derived from the Greek; khōra 'region' + plēthos 'multitude' (thus, be careful not to mix up “choro”, which has no ‘l’, with the “chloro” of chlorophyll or chlorine). Choropleth maps depict quantities aggregated to their regions by filling the entire region with a shade or color. Typically, the quantities are grouped into “classes” (representing a range in data value) and a different fill is used to depict each class (see section 3.2.6 for more on data classification). The goal of choropleth maps is to depict the geographic distribution of the data magnitudes; ideally the choice of fill will communicate the range from low data magnitudes to high magnitudes through an obvious change from light to dark as in Figure 3.18 below. Choropleth maps should use either a sequential color scheme (as below) or a diverging color scheme depending upon whether there is a meaningful break point in the data from which values diverge or the data simply range from low to high (see section 2.1.5.2 above).

Hispanic population density in the U.S. by state. High density along coasts. More in text above.

Figure 3.18: Hispanic population density in the U.S. by state, using a single hue sequential color scheme that depicts the range of data values from low to high with light to dark color values.

Credit: Cartography by Geoff Hatchard.

To generate eye-catching maps with easily distinguishable data classes, choropleth maps often combine color hue differences with a change in color lightness (as with the yellow, through orange, to dark red scheme depicted in Figure 3.18 above). But many maps get produced without following that cartographic rule, leading to some very colorful but misleading maps as shown in the pair below.

Figure 3.19: Misleading population maps due to color choice. On the left, the data values diverge from no change to large increases and decreases, but the fact that most of the US has increases is a lot harder to determine from this map than from Figure 3.3, and regional clusters are harder to recognize as well. On the right, the data are ordered, but the color scheme applied is not visually sequential, so geographic patterns are very hard to identify.

Credit: Jennifer M. Smith, Department of Geography, The Pennsylvania State University; Data from U.S. Census Bureau.

Choropleth maps are most appropriate for representing derived quantities, as represented in Figure 3.18 above. Derived quantities relate a data value to some reference value. Examples include density, average, rate, and percent. A density is a count divided by the area of the geographic unit to which the count was aggregated (e.g., the total population divided by the number of square kilometers to produce population/square mile, as in Figure 3.18). An average is a measure of central tendency, specifically the mean value calculated as a total amount divided by the number of entities producing the amount (e.g., the average income for a county calculated by totaling the income of all people in the country and dividing by the number of people). A rate is a quantity that tells us how frequently something occurs, a value compared to a standard value (e.g., Bradford County, PA had a rate of 45.1/100,000 deaths due to colorectal cancer among women over the period of 1994-2002). A percent is the proportion of a total (and can range from 0-100%). While choropleth maps are best for these derived quantities, you will also encounter choropleth maps used for counts (e.g., the number of crimes committed, votes cast in an election, etc.). When you do, it is important to read the map with caution because big regions are likely to have high totals just because they are big.

Total population count by state & population density by state. General: greater population = smaller population per sq mile

Figure 3.20: Total population count by state (left) and population density by state (right).

Credit: Jennifer M. Smith, Department of Geography, The Pennsylvania State University; Data from U.S. Census Bureau.

3.2.2.2 Census Data

Some of the richest sources of attribute data for thematic mapping, particularly for choropleth maps, are national censuses. In the United States, a periodic count of the entire population is required by the U.S. Constitution. Article 1, Section 2, ratified in 1787, states (in the last paragraph of the section shown below) that “Representatives and direct taxes shall be apportioned among the several states which may be included within this union, according to their respective numbers ... The actual Enumeration shall be made [every] ten years, in such manner as [the Congress] shall by law direct." The U.S. Census Bureau is the government agency charged with carrying out the decennial census.

Upper portion of the Constitution of the United States of America.

Figure 3.21: A portion of the Constitution of the United States of America (preamble and first three paragraphs of Article 1).

Credit: Obtained from: National Archives.

The results of the U.S. decennial census determine states' portions of the 435 total seats in the U.S. House of Representatives. The thematic map below (Figure 3.22) shows states that lost and gained seats as a result of the reapportionment that followed the 2000 census. This map, focused on the U.S. by state, is a variant on a choropleth map. Rather than using color fill to depict quantity, color depicts only change and its direction, red for a loss in number of Congressional seats, gray for no change, and blue for a gain in number of Congressional seats. Numbers are then used as symbols to indicate amount of change (small -1 or +1 for a change of 1 seat and larger -2 or +2 for a change of two seats). This scaling of numbers is an example of the more general application of “size” as a graphic variable to produce “proportional symbols” – the topic we cover in detail in the section on proportional symbol mapping below.

Reapportionment of the U.S. House of Representatives in 2000. General: NE=loss, SE=gain, SW=gain, NW=no change.

Figure 3.22: Reapportionment of the U.S. House of Representatives as a result of the 2000 census.

Credit: Jennifer M. Smith, Department of Geography, The Pennsylvania State University; After figure in Chapter 3, DiBiase, 2012. (Data from U.S. Census Bureau, generalized in MapShaper and Alaska and Hawaii boundaries from Natural Earth).

Congressional voting district boundaries must be redrawn within the states that gained and lost seats, a process called redistricting. Constitutional rules and legal precedents require that voting districts contain equal populations (within about 1 percent). In addition, districts must be drawn so as to provide equal opportunities for representation of racial and ethnic groups that have been discriminated against in the past. Further, each state is allowed to create its own parameters for meeting the equal opportunities constraint. In Pennsylvania (and other states), geographic compactness has been used as one of several factors. Article II, Section 16 of the Pennsylvania Constitution says:

§ 16. Legislative districts.

The Commonwealth shall be divided into 50 senatorial and 203 representative districts, which shall be composed of compact and contiguous territory as nearly equal in population as practicable. Each senatorial district shall elect one Senator, and each representative district one Representative. Unless absolutely necessary no county, city, incorporated town, borough, township or ward shall be divided in forming either a senatorial or representative district. (Apr. 23, 1968, P.L.App.3, Prop. No.1). Source: Constitution of Pennsylvania

Whether districts determined each decade actually meet these guidelines is typically a contentious issue and often results in legal challenges. Below, the Congressional District map for PA that defines the boundaries of districts for the 112^th Congress illustrates how irregular districts can be. District 12 has a particularly interesting shape.

Congressional districts of Pennsylvania map.

Figure 3.23: Congressional districts of Pennsylvania.

Credit: The National Atlas.

Beyond the role of the census of population in determining the number of representatives per state (thus in providing the data input to reapportionment and redistricting), the Census Bureau's mandate is to provide the population data needed to support governmental operations, more broadly including decisions on allocation of federal expenditures. Its broader mission includes being "the preeminent collector and provider of timely, relevant, and quality data about the people and economy of the United States". To fulfill this mission, the Census Bureau needs to count more than just numbers of people, and it does. We will discuss this in more detail later (in section 3.3, Thinking about aggregated data: Enumeration versus samples).

3.2.2.3 Proportional Symbol Mapping

Besides reapportionment and redistricting, U.S. Census counts also affect the flow of billions of dollars of federal expenditures, including contracts and federal aid, to states and municipalities. In 2011, for example, some $486 billion of Medicaid funds were distributed according to a formula that compared state and national per capita income. $93 billion worth of highway planning and construction funds were allotted to states according to their shares of urban and rural population. And $120 billion of Unemployment Compensation was distributed from the Federal level. The thematic maps below (using historical data from 1995) illustrate the strong relationship between population counts and the distribution of federal tax dollars using proportional symbols (symbols in which the graphic variable of size is used to depict data magnitude).

Population and federal expenditures by state, 1995. Almost identical in millions of people to billions of $s

Figure 3.24. Population and federal expenditures, by state, 1995.

Credit: Cartography by Thad Lenker. Data from U.S. Census Bureau, Federal Expenditures by State.

There are two types of point features that are typically depicted with proportional symbols: features for which the data represents a geographic position directly (e.g., gallons of oil from individual oil wells), and features that are geographic areas to which data are aggregated and the data magnitudes are assigned to a representative point within the area (e.g., the geographic centroid of a state as in the examples above). In either case, the area of the symbol is scaled to represent the data magnitude, sometimes with a bit of exaggeration to adjust for a general tendency of human vision to underestimate differences in area. A variant on this direct data-to-symbol scaling groups values into categories first, then scales the symbol to represent the mean for the category, assigning a symbol to each place to represent the category range that the mean for the place falls within (see Figure 3.25 below).

Unemployment Percentages in 2000 in the United States. More in text above.

Figure 3.25: Unemployment Percentages in 2000 in the United States, with each circle representing a category with the percentage range specified in the legend at the right.

Credit: Cartography by Jennifer M. Smith.

One important characteristic of proportional symbols is that they can easily be designed to represent more than one data value per location. Among the most common example is a “pie chart map” in which a circle is scaled proportionally to some total, and the size of wedges within the circle is scaled to depict a proportion of a total for two or more sub-categories. The map below uses circle size to depict population totals in each state, and the pie slices then depict the proportion of that total who identify as Hispanic compared to those who are non-Hispanic.

Rate percents of Hispanic population as percent of total population of each state. More in text above.

Figure 3.26: A "pie chart " map that depicts rate percents of Hispanic population as a percent of total population.

Credit: Cartography by Geoff Hatchard.

Practice Quiz

Registered Penn State students should return now take the self-assessment quiz about Choropleth Mapping, Census Data, and Proportional Symbol Mapping.

You may take practice quizzes as many times as you wish. They are not scored and do not affect your grade in any way.

3.2.2.4 Dot Mapping

For data that represent an area, proportional symbols are a fairly extreme abstraction. They provide a very simple overview of data magnitudes geographically but hide any geographic variation that might occur inside the enumeration units to which the data are aggregated. An alternative is the dot map. Dot maps depict magnitude by frequency rather than the size of symbol and add the depiction of geographic distribution by use of the graphic variable of location. Specifically, dot maps assign one to many dots per enumeration area to represent a specific count in each area. The difference between a dot map and a simple map of point features is that each dot represents more than one entity and the locations are representative of the distribution rather than being exact locations. Specifically, dots that represent some count are placed within enumeration units to represent generally where the feature or attribute occurs.

In the example below, the dot map depicts the size of the Hispanic population by the number of dots per state. Each dot represents 100,000 people in this case, and the general geographic distribution of the Hispanic population within the state is signified by the position of the dots. Not surprisingly, dot maps can vary substantially in how well the distribution of dots on the map represents the actual distribution of the phenomena in the world. Cartographers typically use secondary sources of information to help them decide on the appropriate locations for the dots (e.g., land use maps, satellite images, or statistics collected for smaller geographic units like counties). But, the position of dots usually is based on an educated estimate of distribution rather than on any direct measurement of where the people (in this case) or automobiles or bushels of wheat (or the many other kinds of things we can count) actually are.

A dot density map that depicts count data of hispanic population in the US. Highest in CA, TX, FL, and Northeast

Figure 3.28: A "dot density" map that depicts count data.

Credit: Cartography by Geoff Hatchard.

3.2.2.5 Cartograms

A cartogram can be considered a special case of proportional symbol mapping. But, in this case, the “symbol” that is scaled in proportion to a data magnitude is the geographic area for which data are aggregated. Cartograms are unusual enough that they attract viewer attention, making them a popular mapping method with the media, particularly during election years. Their primary weakness (in addition to distorting geography so that no standard measurements such as distance among places are accurate), is that they cannot be interpreted correctly unless the map reader knows the actual geographic shapes of the map units so that sizes can be related to the places they represent.

The map below shows the results of the 2008 Presidential election, with a red state signifying a majority of votes for John McCain, the republican candidate, and blue states a majority for Barack Obama, the democratic candidate. This cartogram scales the areas of each shape to represent its respective total population, visually showing how the majority of the United States voted.

Cartogram of election results. South=majority red (except FL), West, some of center North, North East=majority blue,

Figure 3.29: Cartogram of election results with red signifying a Republican majority state and blue a Democratic majority state.

Credit: Mark Newman at the University of Michigan.

The following maps illustrate the power that some cartograms can have in helping users visually comprehend a phenomenon. While the map on the left depicts the majority vote results by county (with a vast majority of counties for the Republican candidate), the cartogram on the right shows the areas again depicted by population (this time with the country rather than state level data), revealing the larger number of Democratic support. The map on the left gives a distorted view (even though it does not look distorted) because a majority of counties won by the Republican candidate were low in population and many were large in area.

Election results by county. More in text above and caption.

Figure 3.30: Election results by county with red signifying a Republican majority and blue a Democratic majority (left) and cartogram skewing the counties by their respective populations (right).

Credit: Mark Newman at the University of Michigan.

For more election cartogram examples, visit University of Michigan 2008 election site.

Try This: Practice Identifying Mapping Techniques

Visit the National Geographic Earthpulse map. On the left-hand side, you will find numerous check boxes for different thematic maps. Choose two thematic maps and identify at least two cartographic techniques (any that have been discussed in the chapter) the cartographer used when creating this map. For instance, in the map above (Figure 3.30), the cartographer used a qualitative color scheme (blue and red) on a choropleth map to show different categories (democratic or republication majority vote) for each U.S. state.

3.2.2.6 Numerical Data Classification

As discussed above (and in Chapter 1), all maps are abstractions. This means that they depict only selected information, but also that the information selected must be generalized due to the limits of display resolution, comparable limits of human visual acuity, and especially the limits imposed by the costs of collecting and processing detailed data. What we have not previously considered is that generalization is not only necessary, it is sometimes beneficial; it can make complex information understandable.

Consider a simple example. The graph below (Figure 3.31) shows the percent of people who prefer the term “pop” (not soda or coke) for each state. Categories along the x axis of the graph represent each of the 50 unique percentage values (two of the states had exactly the same rate). Categories along the y axis are the numbers of states associated with each rate. As you can see, it's difficult to discern a pattern in these data; it appears that there is no pattern.

Use the term “pop” by state. Only "spike" is 1.47% of state populate as 2 states vs. the rest at 1 state.

Figure 3.31: Unique percentage values for people who use the term “pop” by state.

Credit: Jennifer M. Smith, Department of Geography, The Pennsylvania State University; Data from The Pop vs. Soda Page.

The following graph (Figure 3.32) shows exactly the same data set, only grouped into 10 classes with equal 10% ranges). It's much easier to discern patterns and outliers in the classified data than in the unclassified data. Notice that people in a large number of states (23) do not really prefer the term “pop” as they are distributed around 0 to 10 percent of users who favor that term. There are no states at the other extreme (91-100%), but a few states whose vast majority (81-90% of their population) prefer the term pop. Ignoring the many 0-10% states where pop is rarely used, the most common states are ones in which about 2/3 favor the term; looking back to Figure 3.13, these are primarily northern states, including Pennsylvania. All of these variations in the information are obscured in the unclassified data.

Classed percentages of people who use the term “pop” by state. More in surrounding text.

Figure 3.32: Classed percentages of people who use the term “pop” by state.

Credit: Jennifer M. Smith, Department of Geography, The Pennsylvania State University; Data from The Pop vs. Soda Page.

As shown above, data classification is a generalization process that can make data easier to interpret. Classification into a small number of ranges, however, gives up some details in exchange for the clearer picture, and there are multiple choices of methods to classify data for mapping. If a classification scheme is chosen and applied skillfully, it can help reveal patterns and anomalies that otherwise might be obscured (as shown above). By the same token, a poorly-chosen classification scheme may hide meaningful patterns. The appearance of a thematic map, and sometimes conclusions drawn from it, may vary substantially depending on the data classification scheme used. Thus, it is important to understand the choices that might be made, whether you are creating a map or interpreting one created by someone else.

Many different systematic classification schemes have been developed. Some produce mathematically "optimal" classes for unique data sets, maximizing the difference between classes and minimizing differences within classes. Since optimizing schemes produce unique solutions, however, they are not the best choice when several maps need to be compared. For this, data classification schemes that treat every data set alike are preferred.

Part of ArcMap classification box.: manual, equal interval, defined interval (jenks), quantile, natural breaks, standard deviation.

Figure 3.33: Portion of the ArcMap classification dialog box highlighting the schemes supported in ArcMap 8.2.