GEOG 486
Cartography and Visualization

Making Choropleth Maps

PrintPrint

Making Choropleth Maps

US Census Bureau choropleth map, see text below
Figure 4.6.1 A choropleth map from the Census (posted earlier as Figure 4.1.2)

Let’s return again to a map that should be becoming familiar, posted now as Figure 4.6.1. Median income is visually encoded in each state as belonging to one of four classes: (1) less than $45,000; (2) $45,000 to $49,999; (3) $50,000 to $59,999, and (4) $60,0000 and more. How were these classes chosen?

Student Reflection

One side-step before we discuss data classification: think back to our discussion of types of color schemes—can you think of another type of color scheme that would be effective in Figure 4.6.1? Do you think it would be better?

When the map in Figure 4.6.1 was being designed, the aforementioned classes had to be decided upon – and there are many different ways in which class breaks in median income could have been drawn. So, how do you choose? Rather than simply choosing the default classification scheme that your GIS software suggests, you should think critically about how your data classes are defined. The first decision you should make, however, is not how, but whether to class your data.

unclassed and classed choropleth maps of Median Household Income ($), see text below
Figure 4.6.2 An unclassed (top) and classed (bottom) choropleth map.
Credit: Maps by Cary Anderson, Penn State University; Data Source: United States Census Bureau, American Community Survey.

Figure 4.6.2 shows an example of two maps—one unclassed (sometimes called a "class-less" map), and one classed. Unclassed maps encode color (usually with lightness) based on the specific value within each enumeration unit, rather than based on a pre-defined class within which the data value falls. These maps are useful as—if designed properly—they may more accurately reflect nuances in the distribution of the data. However, they should not be considered an easy solution to the problem of data classification. They have their own disadvantages, for example, they make it challenging for the reader to match the value encoded in an enumeration unit to its location on the legend. 

Before modern GIS software, unclassed maps were quite difficult to create, but new technology has made their design quite simple. Unclassed maps show a more “direct” visualization of the data, while classifying maps gives you more control over the final map. It will be up to you as the map designer to decide whether to class your map; however, many map readers—and cartographers—still prefer classed maps.

As you will likely be classifying your maps, it is important to understand how this process can influence your final map design. Most of the commonly-used classification methods are available in ArcGIS, and the software interface gives a good simple explanation of each of these methods (Figure 4.6.3). We will not discuss the mathematical details of each of these classification methods here—it is recommended that you explore the recommended readings or do your own research on the web to learn more.

Screenshot of classification options available in ArcGIS Pro, see link below for text description
Figure 4.6.3 Classification options available in ArcGIS Pro.
Credit: Screenshot from ArcGIS Pro.
Click here for a text alternative to this screenshot of classification options available in ArcGIS Pro.

Natural Breaks (Jenks): Numerical values of ranked data are examined to account for non-uniform distributions, giving an unequal class width with varying frequency of observations per class.

Quantile: Distributes the observations equally across the class interval, giving unequal class width but the same frequency of observations per class.

Equal Interval: The data range of each class is held constant, giving an equal class width with varying frequency of observations per class.

Defined Interval: Specify an interval size to define equal class widths with varying frequency of observations per class.

Manual Interval: Create class breaks manually or modify one of the present classification methods appropriate for your data.

Geometric Interval: Mathematically defined class widths based on a geometric series, giving an approximately equal class width and consistent frequency of observations per class.

Standard Deviation: For normally distributed data, class widths are defined using standard deviations from the mean of the data array, giving an equal class width and varying frequency of observations per class.

Though Figure 4.6.3 gives helpful descriptions of each classification method, it offers little advice as to when to use them. A good way to approach this question is to view your data along the number line. You can use histograms (for large data sets) or dot plots (for small data sets) to visualize how your data is distributed, and to select class breaks accordingly. The following suggestions are given by Penn State cartographer Dr. Cynthia Brewer.

1. For data with near-normal distributions, consider classifying your data based on the mean and standard deviation.

2. For skewed distributions, consider systematically increasing classes, such as arithmetic and geometric classing methods.

3. If your data are evenly distributed, equal interval and quantile classing methods work well. These methods are also best for ranked data.

4. Natural breaks, created using Jenks classing method or in selecting breaks by eye, work best for data which shows obvious groupings through the range. The natural breaks method highlights the natural sets of values in the data.

We will look at data using dot plots during this lab associated with this lesson. When you make maps, unless you are working with a very large data set, this will often be the most effective way to visually investigate your dataset in order to choose a classification method or visually/manually place your own breaks. ArcGIS, however, creates histograms of your data that you can also use to understand how the breaks you have chosen relate to the spread of your data.

Student Reflection

Compare the breaks, histograms, and maps in Figure 4.6.4 below. Which classification method would you have chosen? Why?

comparison of two maps (top map created using quantiles classification method, bottom map using natural breaks [Jenks]), see surrounding text
Figure 4.6.4 Two maps created using the quantiles classification method (top), and natural breaks (Jenks) method (bottom).
Credit: Maps by Cary Anderson, Penn State University, Data Source: US Census.

Note that the spread of your data is only one of multiple elements you should consider when choosing how to classify your data. As with other map design choices, your map's intended audience, medium, and purpose are also of vital importance here.

In addition to choosing a classification method for your maps, you also must decide how many classes to create. It may be tempting to create a large number of classes, as more classes means less simplification of your data, and thus more information conveyed to the map viewer. Unfortunately, the human eye can only differentiate between so many colors. The limit is about a dozen colors for a qualitative map, ten for a diverging scheme, and only eight for a sequential scheme. If anything, these are optimistic estimates—your map reader is likely to be able to differentiate between even less.

Student Reflection

View the maps in Figure 4.6.5 below. Looking at the map on the left, can you identify within which class county x belongs? How confident are you that this is the correct answer? What about in the map on the right?

screenshot comparing two maps - a map with 10 classes and a map with 6 classes, see caption      
Figure 4.6.5 A map with many classes (left) makes it more difficult to interpret than a simpler map (right).
Credit: Maps by Cary Anderson, Penn State University, Data Source: US Census.

Finally, when classifying your map data, you will have to contend with outliers in your dataset. Consider a county-level map, where one county has double the rate (for example, of people with graduate-level degrees) of any other county in your data. Some classification methods, such as natural breaks or equal intervals, will most likely group this outlier into a class of its own. Other methods, such as quartiles, will simply place it into a group with all the next-highest counties.

There is no rule for which method is best, except that context matters. Is the rate high because that county contains the most prestigious university in the state? In that case, you probably want it to be highlighted on your map. If instead, it is the highest because only five people live there—and two are college professors—you probably don’t. In general, the more data you have, the less likely an outlier is to be noise: this is called the law of large numbers. Whenever possible, however, you should investigate the possible causes of an outlier—there is no substitute for contextual clues.

There are additional ways to classify your data, including by combining methods—for example, using equal intervals for most of the range, and then switching to natural breaks. Methods also exist that consider not just the distribution of data along the number line, but its distribution through geographic space as well. These are beyond the scope and intent of this lesson, but be aware that you may encounter them in the future.

Recommended Reading

  • Chapter 4: Data Classification. Slocum, Terry A., Robert B. McMaster, Fritz C. Kessler, and Hugh H. Howard. 2009. Thematic Cartography and Geovisualization. Edited by Keith C. Clarke. 3rd ed. Upper Saddle River, NJ: Pearson Prentice Hall.
  • Chapter 11: Data Maps: A Thicket of Thorny Choices. Monmonier, Mark. 2018. How to Lie with Maps. 3rd ed. The University of Chicago Press. (this week's required reading - it relates especially well to this topic). 
  • Tversky, Amos, and Daniel Kahneman. 1971. “Belief in the Law of Small Numbers.” Psychological Bulletin 76 (2): 105–110.