By the end of this lesson you should be able to:
The lessons this week and next week examine issues related to presenting data for thematic display. In Lesson 4 we focus on data classification and color schemes, and in Lesson 5 we focus on different kinds of map representations. How do different data classifications affect map pattern recognition? How do different color schemes affect pattern recognition? What is the appropriate classification for a given dataset? In Lessons 4 and 5, you will explore classification and symbolization tools to create several map series using longitudinal crime data.
Lesson 4 is one week in length. (See the Calendar in ANGEL for specific due dates.) To finish this lesson, you must complete the activities listed below.
|1||Read the concepts introduced for this lesson.||See the Concept Gallery page to begin.|
|2||Work through the Lesson 4 Project.||See the Part I: Getting Started page to begin.|
Submit the Lesson 4 Project.
Detailed instructions on the Lesson Project Tasks page.
|4||Begin work on the proposal for your Capstone Project.||
Read information specifically about the proposal and work on the proposal. Note: You will not be turning the proposal in until Lesson 6.
Complete the Lesson 4 Quiz in ANGEL.
|Take Quiz: ANGEL>Lessons tab>Lesson 4 folder>Lesson 4 Quiz|
If you have any questions now or at any point during this week, please feel free to post them to the Lesson 4 Discussion Forum. While you are there, feel free to post your own responses if you, too, are able to help a classmate.
Most lessons will include a Concept Gallery section that is based on content that you must know before completing the various projects and assignments. View this lesson's concepts via the links below or via the lesson navigation on the side.
Together with the visual variables (refer back to the Symbolization concept gallery item from Lesson 2), one of the most important choices you will make in designing a thematic map is what type of representation you would like to use for your data. In this course, we will focus on and discuss four common types of maps: choropleth (here in Lesson 4), graduated/proportional symbol (in Lesson 5), dot density (in Lesson 5), and isoline maps (in Lesson 6).
When cartographers create thematic maps, one general goal they often have is to try to help the map reader understand the character of the spatial distribution of the attribute(s) displayed in the map. One useful way of talking about spatial distributions is to use the concept of a cartographic data model, first developed by George Jenks, a professor at the University of Kansas (Jenks, 1967). As Jenks defined it, a cartographic data model is an abstract method for representing the most important characteristics of a particular spatial distribution; this representation could be either mathematical or graphical. Other cartographers further developed this notion by creating a typology of how map types can be related to data models (MacEachren and DiBiase, 1991). They identified two important axes along which the spatial distribution of a variable can vary: from discrete to continuous and from abruptly changing to smoothly changing (see Figure 4.cg.1, below). They also matched the visual characteristics of these data models to the visual characteristics of different map types (see Figure 4.cg.2, further below; you may recognize this matching as an exercise in creating map-signs that best match their real world referents - remember our discussion of semiotics in Lesson 1, Part II: Visual Communication).
In the figure below, one of the axes shows a range from discrete to continuous. Discrete phenomena are those that have space between observations (e.g., mobile phone towers), while continuous phenomena exist throughout space (e.g., temperature - it exists everywhere even if we do not choose to measure at every possible location). The other axis relates to the degree of spatial dependence of a phenomenon. A phenomenon with a low degree of spatial dependence may change abruptly over a short distance (e.g., income tax rates between states), while phenomena with a high degree of spatial dependence change more smoothly. Elevation is generally a good example of a smoothly changing phenomenon, with the rare exceptions of canyons and cliffs, where there is a large, abrupt elevation change. One important point is that the character of a phenomenon may be scale-dependent (both spatially and temporally). For example, the distribution of cars may be generally considered to be abrupt: generally we do not find cars in locations that are not paved, and there is some amount of space between cars. However, at certain times of the day (e.g., rush hour), the distribution of cars (at least in particular locations, such as freeways) may become continuous.
These different conceptualizations of geographic phenomena lend themselves to certain map types (or representations) better than others. For example, elevation is smooth and continuous and is therefore often represented with isolines, as shown in the lower right corner of Fig 4.cg.2 (and discussed more in Chapter 6). A tax rate, or most any kind of rate (e.g. mortality) is a value for an area (e.g. county) and therefore is abrupt but still continuous. A choropleth map would then best represent rate data as shown in the lower left corner of Fig 4.cg.2. What data model in Fig 4.cg.1 represents counts of people per area? And what kind of map type in Fig 4.cg.2 would work with that kind of geographic data? Depending on the scale of aggregation you may consider such data to be discrete and abrupt (upper left corner of Fig 4.cg.1) and use a proportional circle for each unit of area, or with small areas of aggregation (compared to your extent) you may think of your data as discrete but smooth (e.g. population per census tract but looking at a whole state), and then you may consider a dot density map (more on this in Chapter 5).
If you are interested in investigating this subject further, I recommend the following:
Choropleth maps are probably the most commonly created type of map in GIS cartography. Their popularity is due to two main reasons: (1) choropleth mapping capabilities are implemented in most every GIS software package; and (2) much of the data that geographers and GIScientists work with is collected and aggregated into enumeration units, which form the basis for choropleth maps (see Figure 4.cg.3,below). Recall that choropleth maps give the map reader the impression that the phenomenon of interest is continuous (i.e., present throughout the areal unit) and abruptly changing (i.e., that the phenomenon is present at the same intensity throughout each areal unit, but changes abruptly at the area's borders).
An enumeration unit is an area defined for a particular purpose (often other than collecting data) and within which data are collected and aggregated. Some common examples of enumeration units include school districts (created to help manage the assignment of students to particular schools within a city or metropolitan area), counties (created as a form of local government) or census tracts (created to help manage the complicated task of counting the population). Typically, the boundaries of enumeration units do not correspond to breaks in the statistical surface of the data that are collected and aggregated to each unit (e.g., the population density does not suddenly change when we cross the border from Los Angeles county to Orange county in southern California). However, there are some cases where enumeration units do provide a good reflection of the structure in the statistical surface (e.g., in the case of income tax rates, which do change abruptly from state to state).
Choropleth maps typically use either differences in color value (sometimes in combination with hue) or differences in spacing (e.g., the intensity of a hatched pattern) to represent differences in the phenomenon being mapped (see Figure 4.cg.4,below). Generally, we use a darker or more closely spaced pattern to represent larger quantities of the phenomenon and a lighter or more sparsely spaced pattern to represent smaller quantities. One empirical study has shown that in most cases (especially with light map backgrounds), map readers do assume that "dark means more and light means less" (McGranaghan 1993).
Although choropleth maps are quite easy to create, there are several issues that you should be aware of and consider when you are thinking about using a choropleth symbolization for representing the phenomenon you would like to map:
One important issue is that the size of enumeration units can be quite variable. This issue is important because larger symbols will dominate the visual appearance of the map and can exaggerate the importance of particular enumeration units. If we chose to map raw counts with a choropleth map, you may see counties such as San Bernardino county (the largest county by area in California) dominate the map, as the county has a relatively large population along with a relatively large area. However, if what we are really interested in is investigating the locations where people are more likely to die as the result of a motor vehicle accident, we should really be looking at rates, as it makes sense that anywhere there is a larger population, you would probably find a larger number of deaths due to motor vehicle accidents, as there are typically a larger number of any count pertaining to people in areas with larger populations. For this reason, in choropleth maps, we typically want to avoid mapping raw counts, but instead transform the data so that we are mapping densities, rates or ratios that allow us to make more realistic comparisons between unevenly sized units. This is not to say that there is never a good reason for mapping raw counts, just that other symbolization methods may be more appropriate (e.g., graduated or proportional symbols).
Modifiable Areal Unit Problem
Because of the arbitrary nature of the boundaries of most enumeration units, we can find ourselves facing the modifiable areal unit problem (MAUP) (Openshaw 1984). Simply put, MAUP arises when different aggregations of individual counts (i.e., drawing the boundaries of enumeration units in different ways) produce different spatial patterns (see Figure 4.cg.6, below). Although there is no 'solution' to MAUP, if we can find data at different scales and that are aggregated to different units, we can create multiple maps that tell a more complete story about the distribution of the phenomenon we are working with. We might also choose to use other types of symbolization, e.g., dot maps (see Lesson 5) or dasymetric maps (see concept below in this concept gallery), that can tell us more about the spatial distribution of our phenomenon of interest. Incidentally, using multiple representations (whether using the same symbolization method or different methods) can also help us better understand where we have mismatches between the breaks in the statistical surface and breaks in the geographical surface.
Data Classification and Map Appearance
Finally, there is the issue of the effect of different data classification decisions (e.g., classification method or the number of classes employed) on the appearance of the map pattern. We discuss this issue in detail in the next section of this concept gallery below, but we should briefly discuss the potential for unclassed choropleth maps. Although there has been some discussion among cartographers about using color values that are proportional to the data values represented in the map (i.e., creating unclassed maps as Tobler suggested in (1973)), the consensus today among cartographers is that it is difficult for map readers to extract quantitative information from static unclassed color value maps, so most cartographers still prefer to classify their data, especially in maps where map readers may need to extract an individual value or compare regions. A final point to note is that in classed choropleth maps, it is important to ensure that symbols are visually differentiable from each other (i.e., that the value differences between symbols are large enough to avoid confusion). This should be evaluated within the context of the map, as simultaneous contrast (the effect of surrounding symbols on the appearance of an interior symbol) can change a symbol's differentiability.
If you are interested in investigating this subject further, I recommend the following:
You can check out an unclassed choropleth map using the MAPresso applet created by Adrian Herzog. Note that it might take some time for the applet to completely download, so be patient.
At its heart, classification is an exercise in categorization. We assign locations to categories in order to reduce the complexity of the real world, thereby creating an abstraction that helps us better understand particular characteristics of the world without the distraction of all of the other possible characteristics that we could examine. A distinguishing feature of locations that belong to the same category is that they have a set of shared characteristics. The way in which we assign locations to a particular category can depend on qualitative or quantitative characteristics of that location (e.g., what type of phenomenon is present at that location, or how much of the phenomenon is found at the location). In the remainder of this lesson, we will focus on quantitative classification schemes (i.e., on grouping locations together because they have similar amounts of some phenomenon).
The most important choices you will have to make when classifying your data are which classification method to use and the number of classes to create. Generally, the fewer classes you use, the more important your choice of classification method is, as the map pattern will typically be more variable when you have fewer classes (see Figure 4.cg.7, below).
However, when you are deciding on how many classes to use, it is also important to evaluate whether your map readers will be able to physically see differences in the symbol set you will use. For example, if you are creating a choropleth map and are only using color value as a visual variable (instead of a combination of color value and hue, which will allow readers to differentiate between a larger number of symbols), most map readers will only be able to distinguish six or seven different value levels, so your map should not exceed six or seven classes (see Figure 4.cg.8, below).
We can group classification methods into three main types, depending on the characteristics of the data that each method uses to create the classification scheme: those that are based on some exogenous (i.e., outside) criteria, those that only consider statistical characteristics of the data, and those that consider both statistical and geographical characteristics of the data. Most easily accessible classification methods within GIS software today only consider the statistical characteristics of the data, although it is also possible to create your own classification scheme based on exogenous criteria.
Classification schemes based on exogenous criteria are schemes that use important data values that are not derived from a statistical property of the data set as classification break points (i.e., boundaries between one class and another). Some common examples of exogenous criteria can include definitions (e.g., the amount of income defined as the poverty level), points at which the direction of change is altered (e.g., zero population growth), or values at a previous point in time (e.g., 1996 level of greenhouse gas emissions for each country). All of these exogenous criteria provide benchmarks against which the value for each location in the map can be compared.
Most methods that cartographers use for creating classification schemes consider the statistical properties of the data set. Some common examples include equal interval, quantile, natural breaks, optimal, mean-standard deviation, and classifications based on mathematical progressions. The equal interval classification method divides the range of the data into classes with equal-sized ranges. This is done by figuring out the range of the data and dividing that range by the number of classes desired (e.g., a data set with values ranging from 0 to 80 and divided into four classes would have the following classes: 0-20; 20-40; 40-60 and 60-80). The quantile method divides the data set into equal numbers of observations per class (e.g., in a dataset with 20 observations and 4 classes, each class would contain 25% of the observations (i.e., 5 observations)). Natural breaks classifications are typically determined by looking at a graph of the data values (ordered from highest to lowest) and placing breaks in places where the slope substantially changes (see Figure 4.cg.9, below). The optimal classification scheme automates the natural breaks process by using an iterative procedure that divides values into classes that minimize within-group variability and maximize between-group variability in an attempt to create the most homogenous classes that are possible with the dataset. The mean and standard deviation scheme uses the mean of the dataset as the middle break point, and uses the values of standard deviations (or some part thereof, such as 0.5 of a standard deviation) added to or subtracted from the mean for determining the other class breaks. Finally, mathematical progressions (e.g., arithmetic and geometric sequences) can be used to create classes that are increasingly larger or increasingly smaller in size.
Each of the schemes that consider statistical propertiesis more or less useful for mapping data with particular types of statistical distributions. For example, the equal interval scheme seems to work best for data with a rectangular distribution (i.e., approximately equal numbers of observations over the data range), whileit isnot very effective for highly skewed data as there may be many empty classes, forcing most observations into one or two classes, and leaving a very uninteresting map. Others, such as the mean-standard deviation scheme, work best for normally distributed data but do not work very well for other types of distributions. Generally, the factors to consider when choosing a classification method include the purpose for which the map will be used, the audience who will be using the map, and the distribution of the data (see Figure 4.cg.10, below).
Recently, several cartographers have argued that classification methods that focus on the statistical characteristics of the data are ignoring an important characteristic of the data: its geographical distribution (Cromley 1996; Murray and Shyy 2000; Armstrong et al. 2003). Without considering the geographical distribution of the data, map readers may have a harder time building regions from the map (Armstrong et al. 2003). Each of these groups has developed a new method that takes contiguity factors (i.e., whether the geographic proximity of observations should be important in drawing class boundaries) into account as well as statistical properties of the data. Cromley (1996) created a minimum-boundary classification that creates classes where the largest differences between adjacent polygonsis represented with different classes, while smaller differences across boundaries are contained within classes. Murray and Shyy (2000) used spatial data mining methods to identify spatial clusters of similar observations, and Armstrong et al. (2003) present a method of using multi criteria decision analysis to aid the cartographer in deciding which class breaks to choose (from the universe of possible classification schemes) depending on what criteria s/he thinks is most important (e.g., spatial structure, class variation minimization, etc.).
If you are interested in investigating this subject further, I recommend the following:
The term 'dasymetric mapping' was first used by Russian geographers who described dasymetric maps as density measuring maps (Wright 1936). Dasymetric maps are similar to choropleth maps in that both types of maps represent data as stepped statistical surfaces. In other words, the data that are within a polygon are assumed to be distributed equally throughout that polygon’s area, and changes in the surface occur abruptly, and only at polygon boundaries.
The main difference between choropleth maps and dasymetric maps is the type of areal unit that is used for collecting data and representing the phenomenon of interest. In choropleth maps, data are typically represented using enumeration units (e.g., census tracts, health service areas, etc.) whose shapes may not be related to the distribution of the geographic phenomenon we are interested in mapping. For this reason, the visual impression that the map gives (i.e., that the phenomenon is evenly distributed throughout the enumeration unit) is usually incorrect. In dasymetric maps, however, the areal units that divide the space are based on the actual character of the data surface, often in combination with enumeration units (see Figure 4.cg.11,below).
By now, you might be wondering how we can create dasymetric maps if data are usually collected using unrelated enumeration units rather than areal units that reflect the nature of the data surface. To get around this problem, we can use ancillary data to create a new set of areal units that better represent the data surface. For example, land use is an ancillary data variable that is often used for creating dasymetric maps of population density. Generally, we can use two types of ancillary data variables: limiting variables and related variables. Limiting variables are attributes that can help us eliminate areas where data values could be. For example, a data layer that depicts where water bodies are located may be useful for mapping population density, as it is highly unlikely that there will be any people living in the middle of lakes or rivers. Related variables have some sort of association or predictable relationship with the data variable we are trying to map. In our population density application, an example of a related ancillary attribute might be land cover; we know that fewer people tend to live in areas that have a cropland land cover than a developed (i.e., built up) land cover, so we can require those areas to have a lower density.
: We will discuss ancillary data in more detail in the Lesson 5 Concept Gallery item called Dot Maps.
When we are creating this new set of areal units, we are basically performing what is called an areal interpolation. In other words, we are transferring quantities of our phenomenon from one set of areal units to another. One thing that we need to be careful about is that we should preserve what Tobler (1979) called the pycnophylactic property. An easy way of describing this is that if you have 100 people in a county, and you subdivide the county into a larger number of units (e.g., new units based on land cover) and redistribute the population among the new units, the sum of the population in the new units should still add up to 100 people. As Lanford and Unwin (1994, p.24) succinctly phrased it: "People are not destroyed or manufactured during the redistribution process."
Although off-the-shelf GIS software does not have built-in functionality for creating dasymetric maps, in recent years there has been renewed interest in creating automated methods for creating this type of map in both raster and vector format (e.g., Fisher and Langford (1996); Eicher (1999); Mennis (2003)).
If you are interested in investigating this subject further, I recommend the following:
Note that this resource discussed pycnophylactic reallocation within the context of making isoline maps rather than dasymetric maps. The principle is the same, but the nature of the way the surfaces changes (i.e., smoothly or abruptly) is what is different.
As you may recall from the Symbolization and Color Spaces Lesson 2 concept gallery items, there are three components of color that cartographers have to work with: hue, value and chroma. In this part of the lesson, we will discuss the different ways that you can use these three components to create different types of color schemes.
The main thing to remember when designing a color scheme is that you want the logic of your colors to relate to the logic in your data (i.e., if you are representing differences in the kind of things on your map, use the component of color that works best for showing nominal differences (hue)). We will discuss four main types of color schemes: sequential, diverging, qualitative and binary.
A sequential scheme is typically used to represent differences in the amount of the phenomenon you are mapping. This difference may be quantitative (e.g., inches of rainfall, hours of sunlight, etc.) or ordinal (e.g., least polluted to most polluted; least desirable vacation spot to most desirable vacation spot). Typically, we use color value combined with color chroma differences when we are creating sequential schemes (see Figure 4.cg.13 below). Experiments with map readers have shown that most map readers associate darker symbols with a larger quantity, and lighter symbols with a smaller quantity (McGranahan 1989), so this is a convention that cartographers generally use when designing a sequential scheme. Generally, map readers will not be able to tell the difference between more than six or seven levels of color value, especially in the complicated context of the map itself. It is possible to extend your sequence by using more than one hue in combination with value (e.g. from yellow through green to blue). This combination will allow you to create a larger number of symbols (that are still differentiable from each other) than you could with color value alone. A final consideration when creating your sequential schemes is that cartographers typically try to use value differences that are perceptually equal throughout the symbol set (i.e., we do not want the difference in lightness between any two neighboring symbols in the scheme to seem larger than the difference between other neighboring pairs).
A diverging scheme can be constructed by fusing two sequential schemes together, using a common color (typically white or another light color such as yellow or light gray) as the midpoint. Hence the name diverging, as this scheme is composed of two sequential schemes that diverge from a common color. Diverging schemes are most useful for making comparisons with some critical value in the data. You can choose to use any number of values as the critical value, ranging from zero (e.g., in a map of population change zero represents no change, with either side of the diverging sequence representing positive or negative population growth) to the mean or median (e.g., in a map of mortality from vehicle accidents (see Figure 4.cg.14,below) to highlight areas that are at higher or lower risk) to some targeted level (e.g., in a map of greenhouse gas emission reductions to emphasize how much more some countries have reduced their emissions beyond the target specified in a treaty and which countries have not met that target and how far they still have to go to meet the target). One research group has also found that diverging schemes have been better able to help map readers identify true clusters of high or low values on maps (and avoid seeing spurious ones), perhaps because of the added differentiation that a second hue brings to the map (Brewer et al. 1997).
A qualitative scheme mainly uses differences in color hue to indicate differences in the kind of some phenomenon (e.g., land use, crop type, religion, etc.). In a qualitative scheme, you will generally want to choose color hues that have approximately the same lightness and chroma level (see Figure 4.cg.15, below). Otherwise, you will find that more saturated or lighter colors really pop out from the map. One exception to this may be in cases where you have groups of related variables within the map. For example, if you were creating a map of foreign-born residents, but you also wanted to make a distinction between levels of residential segregation of new immigrants, you might choose to use a different hue for each continent of origin, and then specify two levels of that hue for each continent based on the proportion of the enumeration unit that group made up (e.g., if the county was more than 30% persons who were born in South America, it might be represented by a dark blue color, while a county where less than 30% of its foreign born residents came from South America would be represented with a lighter blue color).
Binary color schemes are a special case of qualitative or sequential color schemes that have only two categories. Depending on what you are aiming to represent, you may choose to use either color hue or color value for creating a binary scheme. For example, a map that depicted the candidate that most people voted for in the last presidential election might use color hue (e.g., blue and red are colors traditionally used in the United States for this type of map). In other cases, you might choose to use color value (e.g., if you are representing which locations are visible from a particular viewpoint, you might use black for areas that are not visible and white for areas that are visible).
If you are interested in investigating this subject further, I recommend the following:
Before you get started on the assignment, read the concepts in theLesson 4 concept gallery.
Learn more about Classification Schemes in the Concept Gallery.
Learn more about Choropleth Maps in the Concept Gallery.
For this lesson, you will be downloading all of the spatial and attribute data from two world wide web sites. One sponsored by the Cartographic Modeling Lab at the University of Pennsylvania, and the other sponsored by the Pennsylvania Department of Environmental Protection.
First, we will visit the Philadelphia Neighborhood Information System web site, where we will download three datasets: (1) an outline of the city of Philadelphia, (2) the US Census Tracts for Philadelphia, and (3) several years worth of crime data aggregated to the Census tract level.
Read the Disclaimer in order to get a sense of where the data come from. You can peruse the instructions if and when you like (the link is in the New Users box). In the mean time, follow the instructions that follow in order to download the data we will be using.
First we will retrieve the two spatial data layers.
This will allow you to download the Philadelphia city boundary.
The boundary layer is a shapefile dataset.
Next you will download the US Census Tracts for Philadelphia. Census blockgroups are available, but the crime data aggregated at that spatial level is not made available to the public. Blockgroups are smaller than tracts. Privacy issues may arise in situations where there are only a few persons residing in a blockgroup area.
Now you will retrieve incidence of burglary information.
In the page that appears you should see, in step #3 "Choose Data Element(s)," a list of 1998 through 2009, Burglaries (500 series).
Note that when you click the Add Element button, the entries disappear from the #3 window. So, if things get confusing you may need to use your browsers Back button to get back to our Step 13.
The next page will present you with a listing of the burglary counts and rates for each of the 12 years, for each of the Census tracts for Philadelphia (scroll down to see them). Take note of the numeric designations of the Census tracts, the 000100, etc. values.
On the right side of the web page, toward the top, you should also find a small diskette icon with Export It written next to it.
The Lesson4.zip file contains a set of files for Philadelphia hydrology and state maintained roads in Philadelphia. They are provided primarily for reference and understanding of the city.
Eventually you will be joining the burglary data to the Census tracts. Recall the field header name of the Census tract codes that you saw in the text file of burglary data. (If you have been working through from the beginning of this Part of the lesson, that text file should still be open.) Which of the fields in the attribute table of the tracts2000 shapefile has contents that match those code numbers? Make note of the field name, for use later: _________________.
What coordinate system has been assigned to the Data Frame?
Hint: you need to know the names of the fields in the burglaries table and in the shapefile that contain data that the two tables have in common.
You have just completed Part I of this project.
Now, let's apply these concepts in ArcGIS. Data classification and symbolization are controlled as Properties of a given data layer. In this part of the lesson you will classify the burglary data using different techniques and then compare and contrast the results.
The roads and hydrology data are included in this exercise to provide some geographic context to the pattern analysis you will do in Lesson 5. You don't necessarily have to include these layers in your screen captures for this lesson, but you may find that they help give a clearer picture of why areas have high or low crime statistics.
As mentioned above, and in the Lesson 4 concept gallery, enumeration units in choropleth maps rarely represent equal populations or equal area. This means that if we are counting a certain phenomena that relates to people, there will almost always be more incidence of that phenomena where there are more people. So a map of crime counts would likely just show you where more people live, not where there are higher crime rates. So let's not map the burglary data by counts, but create a crime rate from the counts we have.
Rather than separate classes by set value intervals, the quantile classification creates classes with equal numbers of data points in each class. By dictating a certain number of sample points per class, quantile classification schemes can sometimes create classes that include a very wide range of data values. Data values and classes aside, this method produces maps that have an apparent balance - that is to say that each class is represented equally.
ArcGIS includes several other classification methods. They are all organized in the same manner as Equal Interval and Quantile. Two of the common methods are Natural Breaks (Jenks) and Standard Deviations. Classification by natural breaks uses a calculation that creates class breaks inherent within the data by maximizing the differences between classes. In a standard deviation classification, class breaks reflect the variance of data values from the mean and the data range. By default, ArcGIS will use a diverging color scheme to visually emphasize the idea of classes varying from a central mean, and will label the classes only based on the standard deviation of the data values (whether or not this is useful for visually communicating your data).
In the Part II of this lesson you investigated classification methods using ArcGIS's tools and probably used default color schemes. I'd like to revisit (and expand) some of the color topics we touched on briefly in Lessons 1 and 2. In addition, you will see how to create and use Layer files in ArcGIS.
It is important to remember how ArcGIS stores information. As you know, an ArcMap (*.mxd) document does not store the underlying data compiled in your map. Instead it records a pathway (either full or relative) to the location of the data. Other map elements like a legend, north arrow, or scale bar are saved as part of the *.mxd. Similarly, all of the symbolization and classification changes you make are saved with the map document. You may have experienced the process of remaking a map - when you start over all of the steps taken to give your map a certain look have to be repeated. A Layer File is a way of saving your classification and symbolization choices as a stand-alone file that can be used on other maps.
Layer files can also be created from groups of map layers.
Let's take a moment and look at the results in ArcCatalog.
Layer files are given yellow, diamond shaped icons.
It's likely that your roads_rivers.lyr appears unprojected - like the roads shapefile. This is because of the way in which you grouped and saved the *.lyr file. The resultant group inherited coordinate system information from the roads because they were arranged above the rivers in the table of contents. Let's have a closer look at the *.lyr files.
Like a *.mxd file, a layer file does not store the actual data - only the appearance information. This means that delivering a layer file to a colleague is only useful if you both have access to the same datasets.
You will add the two new layer files to a new map shortly but first let's review some material about appropriate color schemes for different mapped data and constraint on color selection.
Learn more about color schemes in the Concept Gallery.
In the Lesson 4 Concept Gallery, we discussed common classification methods and color schemes. Before you start creating any custom colors and ramps, it is worth taking a few minutes to think about these topics again. Below, you will find three maps of different census data (2000 Census). For each map there are nine different color schemes. After reviewing the alternatives, decide which option is the most appropriate for the mapped data. Click the Best Choice link in each caption to see results and comments.
The first set of nine maps (Figures 4.1.a through 4.1.i), each of which uses a unique color scheme, depicts the percentage of people under age 18 identifying themselves as two or more races. The data are aggregated to counties and classified identically in each example. You may click on each individual map to see an enlarged version of that map.
The second set of nine maps (Figures 4.2.a through 4.2.i), each using a unique color scheme, present the percent change in population from 1990 to 2000. The data are again aggregated by county. The U.S. rate of change was 13.2%. You may click on each individual map to see an enlarged version of that map.
The third set of nine maps (Figures 4.3.a through 4.3.i), shows religious affiliation. Counties are classified by the denomination with the highest percentage of religious adherents. [Additional source: Gaustad and Barlow, 2001, New Historical Atlas of Religion in America. Oxford Press, New York.] You may click on each individual map to see an enlarged version of that map.
Most of us take color for granted. We see the world in vivid hues and with subtle variations. As map designers, we also need to be cognizant of those occasions when maps need to be read without color - by choice or because of color blindness. Most people who are colorblind are still able to distinguish differences in lightness and see many hues. Color confusion tends to be exacerbated when desaturated colors are used.
The nature of colorblindness has been extensively researched as a matter of physiology and perception. It has also been modeled in numerous color spaces. While the variety of stable color combinations is lengthy (especially when described as luminosity measurements), we can generalize a list of ten color-pair combinations that are clearly distinguishable to people with common color vision impairments.
To visualize how these hue-pairs were determined, imagine a 3D cube (see Figure 4.4, below). Once the cube is flattened, we can arrange the hues in spectral order around the perimeter and create lightness variation by placing white in the center.
Using this arrangement, we can create regions of colors that are indistinguishable by drawing colorblind confusion lines through the space. The lines drawn on the figure below were approximated based on the CIExyY color space.
Colors within the same or neighboring regions that share similar lightness will be confused. A good rule of thumb is to choose colors that vary in lightness and that are separated by at least one region. In Figure 4.6, below, a diverging color scheme is created by choosing two, three-color lightness ramps from regions that are appropriately distant.
In the previous example, we have discussed color choices in terms of diverging color schemes. All types of color schemes can be modified to be colorblind safe. The basic task is to choose colors that vary distinctly in lightness. An easy test for color stability is to use a black and white photocopier. If your map uses hues or lightness specifications that are too similar, the resulting photocopy will appear as a uniform gray mess.
Let's get back to ArcGIS. For the rest of this part you will use ArcMap to specify and save custom colors and color ramps. To date, all of the custom colors you have created are part of each respective map document. If you specified a great shade of blue for use in one map, it is not available in the next unless you repeat the specification. In this step, you will learn to save colors and color ramps so they are independent of a specific map document.
Let's create some custom colors for the road features.
Notice that the new symbol is now listed in the symbol menu (it should be right at the top). Included with ArcMap are a wide variety of purpose- and industry-specific symbol sets. By saving your interstate symbol in the last step, you have added to these. By default most are turned off when ArcMap launches.
Your custom symbols are stored under the user ID you use to log onto your computer. Notice that this set and the Esri set are checked (by default).
Symbols sets vary for point, line, and polygon features. Because you started this process by selecting a line feature, you will only see line symbols.
The alternative to choosing-customizing-saving a symbol is to create one from scratch using the Style Manager. The Style Manager is a tool window that provides you access and control over the look of predefined and custom symbols and elements. Any custom symbol you create is stored within your Windows profile in a file called <username>.style.
On the left hand side is a list of the active symbols sets. Notice that yours is listed first (described as C:/Documents and Settings/... /<username>.style), followed by Esri and then any others you activated in the previous steps.
Before you leave the Style Manager, let's create another custom ramp. This one will be a colorblind safe, multi-part, algorithmic, diverging scheme. Can you guess how to make it?
Because you are applying a continuous color ramp to classed data, the legend will appear as distinct steps. If you were using your ramp on unclassed data (like elevations), you would see a smooth transition along the ramp.
Next week, using the same data, you will learn about map representations other than the choropleth map.
If you have any questions, please post them to the Lesson 4 Discussion Forum.
Both map layouts should:
When your layouts are complete, export the following two versions of each layout:
Submit your work via the Create Submission link at right (must be logged in to see link). You will have to create two different submissions: One for the first layout, and a second one for the second layout.
Repeat the above bullets for the second map layout, showing burglaries over time.
Note: I will only publish your maps for each other to see after all students have turned in work, and possibly after you receive feedback. If you do not want me to publish your map, please let me know, but also remember that we are all here to learn and improve no matter where we are or what skills we have learned. Sharing work will help you learn about your own work, provide opportunity for further feedback, provide exposure to work of others, and provide opportunity to give feedback to others. Thank you for participating!
You have reached the end of Lesson 4! Double-check the to-do list on the Lesson 4 Overview page to make sure you have completed all of the tasks listed there before you begin Lesson 5.
If you have any questions, feel free to post them to the Lesson 4 Discussion Forum.