The links below provide an outline of the material for this lesson. Be sure to carefully read through the entire lesson before returning to Canvas to submit your assignments.
Spatial clustering methods are useful for making sense of complex geographic patterns. In this week's lesson, we look in a more general way at the various approaches that spatial analysts and geographers have developed for measuring spatial autocorrelation.
At the successful completion of Lesson 4, you should be able to:
Lesson 4 is one week in length. (See the Calendar in Canvas for specific due dates.) The following items must be completed by the end of the week. You may find it useful to print this page out first so that you can follow along with the directions.
Step | Activity | Access/Directions |
---|---|---|
1 | Work through Lesson 4. | You are in the Lesson 4 online content now. Be sure to carefully read through the online lesson material. |
2 | Reading Assignment |
Before you go any further, you need to read the portions of the course text associated with this lesson:
After you've completed the reading, get back into the lesson and supplement your reading from the commentary material, then test your knowledge with the quiz. |
3 | Weekly Assignment | This week's project explores ethnic residential segregation in Auckland, New Zealand using spatial autocorrelation measures provided by the GeoDa tool. |
4 | Term Project | Finalize your Term Project Proposal for the peer review next week. |
5 | Lesson 4 Deliverables |
|
Please use the 'Week 4 lesson and project discussion' to ask for clarification on any of these concepts and ideas. Hopefully, some of your classmates will be able to help with answering your questions, and I will also provide further commentary where appropriate.
The most basic observation to be made about spatial data is that it typically exhibits spatial structure. In statistical terms, this translates to the observation that spatial data are not random. Knowing something about a phenomenon at some location A, often tells us a great deal about the same phenomenon at a location not far away from A. Another way of putting this is that spatial data sets are correlated with themselves over distance.
When two variables x and y are correlated, then given the value of x for a particular case, I can make a good estimate of the likely value of y for that case. Similarly, given information about the value of some attribute measured at spatial location A, then I can often make a reasonable estimate of the value of the same attribute at a nearby location to A. This is due to spatial autocorrelation (spatial self-correlation). What is "nearby" is something defined by the statistician and defined mathematicaly by a geographical weighting scheme. Whatever the measure we use, we often refer to this statistical description of nearby observations as a spatial lag.
Much of the material we study in this course makes use of spatial autocorrelation in data, whether it is assumed or measured. Perhaps the best example is interpolation (see the upcoming Lesson 6), where we use the information only from nearby control points to inform our calculation of an estimated value at a location where no observation has been recorded. We do this because we expect nearby data to be more relevant than distant data. In kriging, this is taken one step further when one method of measuring spatial autocorrelation--the semivariogram--is used to improve further the estimates produced by an interpolation method.
Before we go any further, you need to read a portion of the chapter associated with this lesson from the course text:
There is a brief section (page 81) discussing the Joins Count approach to measuring spatial autocorrelation. This approach is useful for non-numeric data. However, it is only infrequently used, and so, although the concepts introduced are useful, they are not central to a modern treatment of spatial autocorrelation.
Other measures have been developed for numerical data, and, in practice, these are much more widely used. These are discussed in Section 4.3, with a particular focus on Moran's I.
While equation 4.8 (page 82) for Moran's I looks intimidating, it makes a great deal of sense. It consists of:
In the case of Moran's I, the similarity measure is the standard method used in correlation statistics, namely the product of the differences of each value from the mean. This produces a positive result when both the value and neighboring values are higher or lower than the mean, and a negative result when the value and neighboring values are on opposite sides of the mean (one higher, the other lower).
The difference measure is summed over all neighboring pairs of map units (this is where the wij values from a weights matrix come in) and then adjusted so that the resulting index value is in a standard numerical range.
The inclusion of spatial interaction [1] weights between pairs of map units in the formulas for calculating I means that it is possible to experiment with a wide variety of spatial autocorrelation [2] measures by tailoring the particular choice of interaction weights appropriately.
The final topic in measuring spatial autocorrelation [2] is LISA or Local Indicators of Spatial Association.
All the previously discussed measures of spatial autocorrelation share the common weakness that they do not identify specific locations on a map where the measured autocorrelation is most pronounced. That is, they are global measures, which tell us that the map data are spatially autocorrelated but not where to find the data that contribute most to that conclusion. Equally, global measures do not allow us to identify map regions where the pattern runs counter to the overall spatial autocorrelation trend.
LISA statistics address these failings and exemplify a trend in spatial analysis in favor of approaches that emphasize local effects over global ones. (See the papers by Unwin 1996 and Fotheringham 1997 cited in the text for more details on this trend.)
The LISA approach simply involves recording the contributions from individual map units to the overall summary measure, whether it is Moran's I or any other measure.
Significance tests on LISA statistics are hard to calculate and generally depend on Monte Carlo simulation [3], which is discussed on page 84 and again on pages 89-90 of the text, and which you also encountered in Lesson 3's project. The idea is that a computer can randomly rearrange the map unit values many times, measuring the LISA statistic for each map unit each time, and then determine if actual observed LISA values are unusual with respect to this simulated distribution of values. There are some wrinkles to this, revolving around the challenges of multiple testing.
This week's project uses not a GIS program, but a package for exploratory spatial data analysis called GeoDa. GeoDa is a good example of research software. It implements many methods that have been in the academic research literature for several years, some of which have yet to make it into standard desktop GIS tools. Among the methods it offers are simple measures of spatial autocorrelation.
You will use GeoDa to examine the spatial distribution of different ethnic groups in Auckland, New Zealand. In this lesson, you are working with a real dataset.
Until the last 20 years or so, Auckland was a relatively 'sleepy' industrial port. It has been New Zealand's largest city for about a century, but its dominance of the national economy has become even more marked in recent years. This is partly attributable to increasing numbers of immigrants to New Zealand, many of whom have settled in the Auckland region. Today, Auckland accounts for about one third of the total population of the country (about 1.6 million people, depending on where you think the city stops), and for a much larger fraction of the more recent migrant groups. Auckland is the largest Pacific Islander city in the world, and also home to large populations of Māori (the pre-European settlement indigenous people), and Asian peoples, alongside the majority European-descended (or, in Māori, 'Pakeha') 'white' population.
Such rapid change is exciting (it has certainly improved the food in Auckland!), but can also lead to strains and tensions between and within communities. We can't possibly explore all that is going on in a short project like this, but, hopefully, you will get some flavor of the city from this exercise.
The basic analytical approach adopted in this project is very similar to that presented by Andrea Frank in an article:
'Using measures of spatial autocorrelation to describe socio-economic and racial residential patterns in US urban areas' pages 147-62 in Socio-Economic Applications of Geographic Information Science edited by David Kidner, Gary Higgs and Sean White (Taylor and Francis, London), 2002.
This week's project is deliberately more like a short exercise than some of the upcoming projects. This is for two reasons. First, you should be spending a good amount of time starting to develop your term-long project, and producing your project proposal. Second, we will cover some ideas in this project not covered in the readings and also introduce a new tool. If you want to explore these ideas and the GeoDa tool further, then I hope that this exercise will give you an idea where to start!
The zip file you need for Project 4, project4materials.zip, is available in Canvas for download. If you have any difficulty downloading this file, please contact me.
The contents of this archive are as follows:
You will also need a copy of the GeoDa software in order to run the required analysis for this project.
GeoDa was originally developed at the Spatial Analysis Laboratory (SAL) at the University of Illinois at Urbana-Champaign. The lead researcher on this project has moved now to the University of Chicago. GeoDa can be downloaded there [4].
The instructions in this project refer to Version 1.14.0 of GeoDa on Windows 10, but things are very similar in the other versions. There are also versions for the Mac and Linux.
For this week’s project, the minimum items you are required to have in your write-up are:
Please use the 'Discussion - Lesson 4' forum to ask for clarification on any of these concepts and ideas. Hopefully, some of your classmates will be able to help with answering your questions, and I will also provide further commentary there where appropriate.
Once installed, you run GeoDa by clicking an icon or double-clicking a shortcut in the usual way. If the GeoDa installer did not make an entry in the Start Menu, you can create a shortcut by navigating to C:\Program Files\GeoDa\geoda_version.exe (or wherever you find the .exe file on your computer) then right-clicking and selecting Create Shortcut.
When GeoDa starts up, Connect to a data source using the File tab. If the 'Connect to a data source' window does not automatically appear, Choose File-New and it should open. Choose a shapefile to examine.
Making maps in GeoDa is simple: select the type of map you want from the Map menu. With the datasets you are working with in this project, only the following four options, Quantile, Percentile, Box Map and Standard Deviation make sense. Each of these makes a choropleth with the class intervals based on a different statistical view of the data.
Be particularly careful in your interpretation of a Quantile or Percentile map if you make one: the class intervals do not relate to the percent values but to the ranking of data values.
In some versions of GeoDa, I have been unable to get the Cartogram to work with the Census Area Unit shapefiles used in this project. [NB: It does work in the most recent version: 1.14.0].
I believe that this is a problem with the shapefiles, and not with GeoDa. Specifically, when ArcGIS is used to aggregate polygon shapefiles from smaller units (here, I made the CAUs from the mesh block data), it often shifts polygon boundaries sufficiently that they no longer touch one another. The cartogram tool relies on polygons touching one another for its simplified picture of the map. If you are interested in making a cartogram, the akCity_MB01_ethnic shapefile works, or try the sample data sets supplied with GeoDa.
The main focus of GeoDa is exploratory spatial data analysis (ESDA). To get a flavor of this, try making a histogram or scatterplot using the named options in the Explore menu. Once you have a histogram or scatterplot in one window, you can select data points in the statistical display, and see those selections highlighted in the map views. In general, any selection in any window in GeoDa will be highlighted in all map views. This is called linked-brushing and is a key feature of exploratory data analysis.
Linked-brushing can help you to see patterns in spatial data more readily, particularly spatial autocorrelation effects. When data is positively spatially autocorrelated, moving the 'brush' in an area in a statistical display (say a scatterplot) will typically show you sets of locations in the map views that are also close together. Moving the brush around can help you to spot cases that do not follow the trend.
For a moving brush, make a selection in any view while holding down the <CTRL> key (CMD key if you are working on a Mac). Once you have made the selection, you can let go of the <CTRL> key and then move the selection area around by dragging with the mouse. To stop the moving selection, click again, anywhere in the current view.
However, seeing a pattern is not the same as it really being there. You will see repeated examples of this in lessons in this course. In the case of spatial autocorrelation, that is the role of the measures we have covered in this lesson's reading, and in particular, Moran's /, which we will look at more closely in the remainder of this project.
While GeoDa is like a GIS, you will soon find its cartographic capabilities somewhat limited. Where it really comes into its own is in the integration of spatial analysis methods with mapping tools.
To determine the spatial autocorrelation of a variable globally across a map using Moran's I, you access the Space - Univariate Moran's I menu. However, before doing this, you need a representation of the contiguity structure of the map, that is, which map units are neighbors to each other. This provides the wij values for the Moran's I calculation to determine which pairs of attribute values should be included in the correlation calculation.
GeoDa provides tools for creating contiguity matrices under the Tools - Weights Manager > Create menu option. Selecting this option opens the Weights File Creation dialog (Figure 4.1).
The various options available here are explained in the GeoDa documentation [5]. For the purposes of this project, I have already created simple contiguity matrix files called ak_CAU01.gal, akCity_CAU01.gal and akCity_MB01.gal. Use the Weights Manager to load the .gal file that matches the shapefile you have added to the project.
It is instructive to examine (but don't edit!) these .gal files in a text editor. For example, if you open akCity_CAU, the first few lines look like this:
101
1 6
3 5 21 23 25 28
2 4
3 4 21 34
3 5
1 2 4 5 21
4 5
2 3 5 6 34
5 7
1 3 4 6 25 28 29
The first line here shows how many areal units there are in the associated shapefile, in this case the 101 CAUs in Auckland City. Each pair of lines after that has the following format.
A more complete explanation of alternative formats for GAL and GWT formats (the latter allows weighted contiguities based on inverse distance and so on) is provided in the GeoDa documentation.
NOTE 1: The real reason I have provided pre-calculated GAL files is that the previously mentioned problem with the CAU shapefiles (see the previous page) prevents some versions of GeoDa from successfully calculating them itself. I was able to get around the problem using R [6] with the spdep, shapefile and maptools packages. If you ever face a similar problem, you may also find this helpful. spdep provides a method for calculating GAL files that includes a tolerance, so that areal units within a specified 'snap' distance of one another are considered neighbors.
NOTE 2: In some versions of GeoDa, you may get a warning that the .GAL file relies on 'recorder ordering' rather than an ID variable, and suggesting you make a new weights file. There is no need to do this – the provided GAL file will work fine.
NOTE 3: More recently, it has become possible to create spatial weights matrices in ArcGIS, although these follow their own file format. If you want to pursue this, try the Spatial Statistics Tools - Modeling Spatial Relationships - Generate Spatial Weights Matrix script.
This is easy. Select the Space - Univariate Moran's I menu option and specify the variable to use, and the contiguity matrix to use. GeoDa will think for a while, and then present you with a display that shows the calculated value of Moran's I and a scatterplot (Figure 4.2).
The Moran scatterplot is an illustration of the relationship between the values of the chosen attribute at each location and the average value of the same attribute at neighboring locations. In the case shown, large Percentages of Europeans (points on the right-hand side of the plot) tend to be associated with high local average values of Percentage of Europeans (points toward the top of the plot).
It is instructive to consider each quadrant of the plot. In the upper-right quadrant are cases where both the value and local average value of the attribute are higher than the overall average value. Similarly, in the lower-left quadrant are cases where both the value and local average value of the attribute are lower than the overall average value. These cases confirm positive spatial autocorrelation. Cases in the other two quadrants indicate negative spatial autocorrelation. Depending on which groups are dominant, there will be an overall tendency towards positive or negative (or perhaps no) spatial autocorrelation.
Using linked brushing, you should be able to identify which areas of the map are most responsible for high or low observed spatial autocorrelation, and which, if any, locations run counter to the overall pattern.
For a single variable on a single map, describe the results of a global Moran's I spatial autocorrelation analysis in your write-up. Include a choropleth map and Moran scatterplot in your write-up along with commentary and your interpretation of the results. In particular, identify map areas that contribute strongly to the global outcome.
Deriving a global, whole-map measure is often not the thing of most interest to analysts. Rather, it may be more important to know which local features in the data are contributing most strongly to the overall pattern.
In the context of spatial autocorrelation, the localized phenomena of interest are those areas on the map that contribute particularly strongly to the overall trend (which is usually positive autocorrelation). Methods that enable an analyst to identify localized map regions where data values are strongly positively or negatively associated with one another are collectively known as Local Indicators of Spatial Association (or LISA).
Again, GeoDa has a built-in capability to calculate LISA statistics and provide useful interactive displays of the results.
The menu option in GeoDa is Space - Univariate Local Moran's I. The easiest way to learn how LISA works is to run it through the user interface shown in Figure 4.3.
Note that the map view here (top left) was present before LISA was run. Depending on which version of the software you are using, the windows may be separate or part of a larger interface.
The meaning of each of these displays is considered in the next sections.
This display is exactly the same as the one produced previously using global Moran's I. By linking and brushing between this and other displays, you may be able to develop an understanding of what they are showing you.
The LISA cluster map looks like the pattern shown in Figure 4.4.
Interpretation of this map is straightforward. Red highlighted regions have high values of the variable and have neighbors that also have high values (high-high). As indicated in the legend, blue area are low-low in the same scheme, while pale blue regions are low-high and pink areas are high-low. The strongly colored regions are therefore those that contribute significantly to a positive global spatial autocorrelation outcome, while paler colors contribute significantly to a negative spatial autocorrelation outcome.
By right-clicking in this view, you can alter which cases are displayed, opting to see only those that are most significant. The relevant menu option is the Significance Filter. The meaning of this will become clearer when we consider the LISA Significance Map.
The LISA Significance Map is shown in Figure 4.5.
This display shows the statistical significance level at which each region can be regarded as making a meaningful contribution to the global spatial autocorrelation outcome.
This is determined using a rather complex Monte Carlo randomization procedure:
The combination of the cluster map and the significance map allows you to see which locations are contributing most strongly to the global outcome and in which direction.
By adjusting the Significance Filter in the cluster map, you can see only those areas of highest significance. By selecting the Randomization right-click menu option and choosing a larger number of permutations, you can test just how strongly significant are the high-high and low-low outcomes seen in the cluster map.
I know that this is all rather complicated. Feel free to post questions to this week's discussion forum if you are not following things. Your colleagues may have a better idea of what is going on than you do! Failing that, I will respond, as usual, to messages posted to the boards to help clear up any confusion.
For a single variable on a single map (using the same variable and a different map (shapefile) from the last one), describe the results of a univariate LISA analysis. Include the cluster map and Moran scatterplot in your write-up along with commentary and your interpretation of the results.
For Project 4, the minimum items you are required to have in your write-up are:
NOTE: When you have completed this week's project, please submit it to the Canvas drop box for this lesson.
This week you should be revising your project proposal and getting ready for the peer-review meeting with your group next week. Post your revised project proposal to the Term Project: Revised Proposal discussion forum.
Additional details can be found on the Term Project Overview Page [7].
Post your revised project proposal to the 'Term Project: Revised Proposal' discussion forum.
Please use the Discussion - General Questions and Technical Help discussion forum to ask any questions now or at any point during this project.
NOTE: When you have completed this week's project, please submit it to the Canvas drop box for this lesson.
You have reached the end of Lesson 4! Double-check the to-do list on the Lesson 4 Overview page [8] to make sure you have completed all of the activities listed there before you begin Lesson 5.
For more information on GeoDa explore the GeoDa website [9].
For more information on how spatial clustering can be used, perform a search using Google Scholar (search on terms such as GeoDa, Moran’s I, spatial autocorrelation).
Links
[1] https://www.e-education.psu.edu/geog586/taxonomy/term/305
[2] https://www.e-education.psu.edu/geog586/taxonomy/term/271
[3] https://www.e-education.psu.edu/geog586/taxonomy/term/321
[4] https://geodacenter.github.io/download.html
[5] https://geodacenter.github.io/workbook/4a_contig_weights/lab4a.html
[6] http://cran.r-project.org/
[7] https://www.e-education.psu.edu/geog586/node/828
[8] https://www.e-education.psu.edu/geog586/node/811
[9] https://geodacenter.github.io