GEOG 586
Geographic Information Analysis

Practical Point Pattern Analysis

PrintPrint

Required reading:


  • Chapter 8, "Point patterns and cluster detection," page 261; pages 264 - 269.

You may want to come back to this page later, after you've worked on this week's project.

In the real world, the approaches discussed up to this point have their place, but they also have some severe limitations.

Before we begin, it is important to understand that some people conflate random with "no pattern." Random is a pattern. From the perspective of this class, every point distribution has a pattern - which may in fact be random, uniform, or clustered.

The key issue is that classical point pattern analysis allows us to say that a pattern is 'evenly-spaced/uniform' or 'clustered' relative to some null spatial process (usually the independent random process), but it does not allow us to say where the pattern is clustered. This is important in most real-world applications. A criminal investigator takes it for granted that crime is more common at particular 'hotspots', i.e., that the pattern is clustered, so statistical confirmation of this assumption is nice to have ("I'm not imagining things... phew!"), but it is not particularly useful. However, an indication of where the crime hotspots are located would certainly be useful. 

The problem is that detecting clusters in the presence of background variation in the affected population is very difficult. This is especially so for rare events. For example, although the Geographical Analysis Machine (GAM) has not been widely adopted by epidemiologists, the approach suggested by it was ground-breaking and other more recent tools use very similar methods.

The basic idea is very simple: repeatedly examine circular areas on the map and compare the observed number of events of interest to the number that would be expected under some null hypothesis (usually spatial randomness). Tag all those circles that are statistically unusual. That's it! Three things make this conceptually simple procedure tricky.

  • First, is the statistical theory associated with determining an expected number of events is dependent on a number of spatially varying covariates of the events of interest, such as populations in different age subgroups. Thus, for a disease (say cancer or heart disease) associated with older members of the population, we would naturally expect to see more cases of the disease in places where more older people live. This has to be accounted for in determination of the number of expected events.
  • Second, there are some conceptual difficulties in carrying out multiple statistical significance tests on a series of (usually) overlapping circles. The rather sloppy statistical theory in the original presentation of the GAM goes a long way to explaining the reluctance of statistical epidemiologists to adopt the tool, even though more recent tools are rather similar.
  • Third, is the enormous amount of computation required for exhaustive searching for clusters. This is especially so if stringent levels of statistical significance are required, since many more Monte Carlo simulation runs are then required.

Another tool that might be of interest is SatSCAN. SatSCAN is a tool developed by the Biometry Research Group of the National Cancer Institute in the United States. SatSCAN works in a very similar way to the original GAM tool, but has wider acceptance among epidemiological researchers. You can download a free copy of the software and try it on some sample data. However, for now, let’s use some of the methods we have highlighted in Table 3.1 to analyze patterns in the real world.

Quiz

Ready? First, take the Lesson 3 Quiz to check your knowledge! Return now to the Lesson 3 folder in Canvas to access it. You have an unlimited number of attempts and must score 90% or more.