Lesson 4: Point Pattern Analysis

Introduction

In the previous lesson, we saw how a spatial process can be described in mathematical terms so that the patterns it is expected to produce can be predicted. In this lesson, we will apply this knowledge to the analysis of point patterns. Point pattern analysis is the application in which these ideas are most thoroughly developed, so it is the best place to learn about this approach.

Point pattern analysis has become an extremely important application in recent years, particularly in crime analysis, in epidemiology, and in facility location planning and management. Point pattern analysis also goes all the way back to the very beginning of spatial analysis in Dr. John Snow's work on the London cholera epidemic of 1854.

Objectives

By the end of this lesson, you should be able to:

  1. define point pattern analysis and list the conditions necessary for it to work well;
  2. explain how quadrat analysis of a point pattern is performed and distinguish between quadrat census and quadrat sampling methods;
  3. discuss relevant factors in determining an appropriate quadrat size for point pattern analysis;
  4. describe in outline kernel density estimation and understand how it transforms point data into a field representation;
  5. describe distance-based measures of point patterns (mean nearest neighbor distance and the G, F and K functions);
  6. explain how distance-based methods of point pattern measurement are derived from a distance matrix;
  7. describe how the independent random process and expected values of point pattern measures are used to evaluate point patterns, and to make statistical statements about point patterns;
  8. explain how Monte Carlo methods are used when analytical results for spatial processes are difficult to derive;
  9. justify the stochastic process approach to spatial statistical analysis;
  10. discuss the merits of point pattern analysis versus cluster detection, and outline the issues involved in real world applications of these methods.

Steps to Completion

Lesson 4 is one week in length. (See the Calendar in Canvas for specific due dates.) To finish this lesson, you must complete the activities listed below. You may find it useful to print this page out first so that you can follow along with the directions.

Steps to Completing Lesson 4
Step Activity Access/Directions
1 Work through Lesson 4. You are in the Lesson 4 online content now. The Overview page is previous to this page, and you are on the Checklist page right now.
2 Reading Assignment This week, the reading is detailed, demanding, and long. I therefore recommend that you start it as soon as possible. The project only requires the first chapter of reading for its completion, so you may want to do the first part of the reading, complete the project, and then return to the reading. Whatever you do, don't leave the reading to the last minute this week!
  • Chapter 5, "Point Pattern Analysis," pages 121-151 and Section 3.6, pages 68-71 only on Kernel Density Estimation [Ch 4, pages 77-114 1st edn]
  • Chapter 6, Sections 6.1-6.6, "Practical Point Pattern Analysis," pages 157-177 [Ch 5, Sections 5.1, pages 115-123 in 1st edn]
After you've completed the reading, or at the very least skimmed the material, get back online and supplement your reading from the commentary material, then test your knowledge with the self-test quizzes.
3 Lesson 4 Deliverables This lesson is one week in length. The following items must be completed by the end of the week. See the Calendar tab, above, for the specific date.
  1. Complete the two self-test quizzes satisfactorily (you have an unlimited number of attempts and must score 90% or more).
  2. Complete the Project 4 activities. This involves running and interpreting point pattern analysis of some crime data for St. Louis. (The materials for Project 4 can be found towards the end of this lesson.)
  3. There is no specific activity in the term-long project this week, as the core of this week's lesson will keep you busy enough. However, don't forget that you have to submit a review of other research proposals next week—I will tell you which ones this week.

 

Questions?

Please use the 'Lesson 4 Discussion Forum' to ask for clarification on any of these concepts and ideas. Hopefully, some of your classmates will be able to help with answering your questions, and I will also provide further commentary there where appropriate.

Commentary - Chapter 5 [4], "Point Pattern Analysis"

Section 5.2, Describing a point pattern, pages 123-126 [Section 4.2, pages 79-81, 1st edn]

It should be pointed out that the distinction between first- and second-order effects is a fine one. In fact, it is often scale-dependent, and often an analytical convenience, rather than a hard and fast distinction. This becomes particularly clear when you realize that an effect that is first-order at one scale may become second-order at a smaller scale (that is, when you 'zoom out').

The simplest example of this is when a (say) east-west steady rise in land elevation viewed at a regional scale is first-order, but zooming out to the continental scale, this trend becomes a more localized topographic feature. This is yet another example of the scale-dependence effects inherent in spatial analysis and noted in Lesson 1.

Section 3.5, pages 68-71 and Section 5.2, pages 126-130 Density-based point pattern measures [Section 4.3, pages 81-88, 1st edn]

It is worth emphasizing the point that quadrats need not be square, although it is rare for them not to be.

With regard to kernel density estimation (KDE) it is worth pointing out the strongly scale-dependent nature of this method. This becomes apparent when we view the effect of varying the KDE bandwidth on the estimated event density map in the following sequence of maps, all generated from the same pattern of Redwood saplings as recorded by Strauss, and available in the spatstat package in R (which you will learn about in the project). To begin, Figure 4.1 shows a bandwidth of 0.25.

Figure 4.1. A resultant KDE map using a bandwidth of 0.25.
Using a larger KDE bandwidth results in a very generalized impression of the event density (the bandwidth is expressed relative to the full extent of the square study area). A large bandwidth tends to emphasize any first-order trend variation in the pattern (Figure 4.1).
Figure 4.2. A resultant KDE map using a bandwidth of 0.01.
The map generated using a small KDE bandwidth is also problematic, as it focuses too much on individual events and small clusters of events, which are self-evident from inspecting the point pattern itself (Figre 4.2).
Figure 4.3. A resultant KDE map using a bandwidth of 0.01.
An intermediate choice of bandwidth results in a more satisfactory map that enables distinct regions of high density of events to be identified (Figure 4.3). Choice of the bandwidth is something you may need to experiment with, and there are a number of methods for 'optimizing' the choice, although these are complex statistical methods, and it is probably better to think more in terms of what distances are meaningful in the context of the particular phenomenon being studied.

Section 5.2, Distance-based point pattern measures, pages 130-137 [Section 4.4, pages 88-95, 1st edn]

It may be helpful to briefly distinguish the four major distance methods discussed here:

  1. Mean nearest neighbor distance is exactly what the name says!
  2. G function is the cumulative frequency distribution of the nearest neighbor distance. It gives the probability for a specified distance, that the nearest neighbor distance to another event in the pattern will be less than the specified distance.
  3. F function is the cumulative frequency distribution of the distance to the nearest event in the pattern from random locations not in the pattern.
  4. K function is based on all inter-event distances, not simply nearest neighbor distances. Interpretation of the K function is tricky for the raw figures and makes more sense when statistical analysis is carried out as discussed in a later section.
  5. Pair correlation function (this is touched on in the 2nd edn, p. 137, but not mentioned in the 1st) is a more recently developed method, which like the K function is based on all inter-event distances, but which is non-cumulative, so that it focuses on how many pairs of events are separated by any particular given distance. Thus, it describes how likely it is that two events chosen at random will be at some particular separation.

It is useful to see these measures as forming a progression from least to most informative (with an accompanying rise in complexity).

Sections 5.3, 5.4, Assessing point patterns statistically, pages 139-153 [Section 4.5, pages 95-108, 1st edn]

The measures discussed in the preceding two sections can all be tested statistically for deviations from the expected values associated with a random point process. In fact, deviations from any well defined process can be tested, although the mathematics required becomes more complex.

This section simply outlines how each of the measures described in previous sections may be tested statistically. The most complex of these is the K function, where the additional concept on an L function is introduced to make it easier to detect large deviations from a random pattern. In fact, using the pair correlation function, many of the difficulties of interpreting the K function disappear, so this approach is becoming more widely used.

More important, in practical terms, is the Monte Carlo procedure discussed on pages 148-154 [pages 104-108, 1st edn]. Monte Carlo methods are common in statistics generally, but are particularly useful in spatial analysis when mathematical derivation of the expected values of a pattern measure can be very difficult. Instead of trying to derive analytical results, we simply make use of the computer's ability to randomly generate many patterns according to the process description we have in mind, and then compare our observed result to the simulated distribution of results. This approach is explored in more detail in the project for this lesson.

Quiz

Ready? Take the first Lesson 4 Quiz to check your knowledge! Return now to the Lesson 4 folder in Canvas to access it. You have an unlimited number of attempts and must score 90% or more.

 

Commentary "Practical Point Pattern Analysis"

You may want to come back to this section, which considers the discussion and ideas in Section3 6.1-6.6 [Section 5.1, 1st edn], later,  after you've worked on this week's project.

In the real world, the approaches discussed up to this point have their place, but they also have some severe limitations.

The key issue is that classical point pattern analysis allows us to say that a pattern is 'evenly-spaced' or 'clustered' relative to some null spatial process (usually the independent random process), but it does not allow us to say where the pattern is clustered. This is important in most real world applications. A criminal investigator takes it for granted that crime is more common at particular 'hotspots', i.e., that the pattern is clustered, so statistical confirmation of this assumption is nice to have ("I'm not imagining things... phew!"), but it is not particularly useful. However, an indication of where the crime hotspots are located would certainly be useful.

The problem is that detecting clusters in the presence of background variation in the affected population is very difficult. This is especially so for rare events. You can get some idea of the degree of difficulty from the description of the Geographical Analysis Machine (GAM) on pages 166-168 and 173-177 [pages 119-122]. Although GAM has not been widely adopted by epidemiologists, the approach suggested by it was ground-breaking and other more recent tools use very similar methods. (See the optional 'Try This' box below for more on this.)

The basic idea is very simple: repeatedly examine circular areas on the map and compare the observed number of events of interest to the number that would be expected under some null hypothesis (usually spatial randomness). Tag all those circles that are statistically unusual. That's it!

Three things make this conceptually simple procedure tricky.

  • First, is the statistical theory associated with determining an expected number of events—perhaps —dependent on a number of spatially varying covariates of the events of interest, such as populations in different age subgroups. Thus, for a disease (say) associated with older members of the population, we would naturally expect to see more cases of the disease in places where more older people live. This has to be accounted for in determination of the number of events expected.
  • Second, there are some conceptual difficulties in carrying out multiple statistical significance tests on a series of (usually) overlapping circles. The rather sloppy statistical theory in the original presentation of the GAM goes a long way to explaining the reluctance of statistical epidemiologists to adopt the tool, even though more recent tools are rather similar.
  • Third, is the enormous amount of computation required for exhaustive searching for clusters. This is especially so if stringent levels of statistical significance are required, since many more Monte Carlo simulation runs are then required.

Try This!

If you are interested, take a look at the SatSCAN website. SatSCAN is a tool developed by the Biometry Research Group of the National Cancer Institute in the United States. SatSCAN works in a very similar way to the original GAM tool, but has wider acceptance among epidemiological researchers. You can download a free copy of the software and try it on on some sample data.

Quiz

Ready? Take the second Lesson 4 Quiz to check your knowledge! Return now to the Lesson 4 folder in Canvas to access it. You have an unlimited number of attempts and must score 90% or more.

 

Final Activities for Lesson 4

Now that you've completed the readings and self-test quizzes for this lesson, it is time to apply what you've learned!

  1. Complete Project 4, in which we will analyze crime data for St Louis, in order to demonstrate some of the point pattern analysis methods that have been discussed in this week's lesson. The materials for Project 4 can be found towards the end of this lesson after the Term-Long Project description on the next page.
  2. Continue the Term-Long Project by reviewing the Week 4 directions on the next page.

Term-Long Project: Week 4 - Beginning the Peer Review Process

There is no specific deliverable for this week; however, you should use this week to begin the peer review process for the preliminary proposals. Early this week, I will send you a message letting you know which students proposals you have been assigned to review. Begin by looking at the proposals you have been assigned to review as posted on the 'Project Proposal Discussion Forum.' Then, simply post your comments to the assigned project proposal topic. Your peer reviews are due by the end of Week 5. (Although you are welcome to post them at any point between now and then.)

Timely submission of your peer reviews are worth up to 3 points of the total 30 points available for the term-long final project.

You should consider the following aspects in writing comments for the authors of the proposals:

  • Are the goals reasonable and achievable? It is a common mistake to aim too high and attempt to do too much. Suggest possible amendments to the proposals' aims that might make them more achievable in the time frame.
  • Are the data adequate for the task proposed? Do you foresee problems in obtaining or organizing the data? Suggest how these problems could be avoided.
  • Are the proposed analysis methods appropriate? Suggest alternative methods or enhancements to the proposed methods that would also help.
  • Provide any additional input that you feel is appropriate. This could include suggestions for additional outputs (e.g., maps) not specifically mentioned by the author, or suggestions as to further data sources, relevant things to read, relevant other examples to look at, and so on.

Remember... you will be receiving two reviews from other students of your own proposal, so you should include the types of useful feedback that you would like to see in those commentaries. Criticism is fine, provided that it includes constructive inputs and suggestions. If something is wrong, how can it be fixed?

Meanwhile, I will be reviewing the preliminary proposals, and providing each of you with feedback and suggestions. I will aim to complete my reviews and e-mail them to you this week.

Questions?

Please use the 'General Issues' discussion forum to ask any questions now or at any point during this project. You'll find this forum listed under 'Term-Long Project Discussion Forums' in the 'Modules' section in Canvas.

Project 4: Point Pattern Analysis

Overview

Background

In this week's project, you will use some of the point pattern analysis tools available in the R package spatstat to investigate some point patterns of crime in St. Louis, Missouri.

Project Resources

You need an installation of R, to which you will need to add the spatstat and maptools packages. You should already have added spatstat. To add maptools, use the Packages - Install package(s)... menu option as before.

You will also need data:

  • The crime data are in a text file StLouisCrime2014.txt. This file is a mere 10 Kb so should take almost no time to download!
  • city_limits_km.zip contains a shape file for the approximate 'city limits' for which these crime data were collected, projected so that the measurement units are kilometers. Again, this is a small file (24 Kb). This is required for proper point pattern analysis to be feasible.
  • StLouisCrime2014.gdb.zip contains an ArcGIS file geodatabase with various data layers relating to St. Louis, just to give you some context, in case you are missing ArcGIS! At a hefty 1 Mb or so, this may take a little longer to download, 2-3 minutes on a 56 Kbps modem.

You should get your R installation ready (install the packages mentioned above), start up R, change the directory to wherever you have put the city_limits_km shapefile (you will have to unzip it) and crime data file.

Deliverables

For Project 4, the items you are required to submit are as follows:

  • Create density maps of the gun homicide data, experimenting with the kernel density bandwidth, and provide a commentary discussing the most suitable bandwidth choice for this analysis visualization method.
  • Perform point pattern analysis on two of the three crime datasets (preferably contrasting ones) by using whatever methods seem the most useful, and present your findings in the form of plots and accompanying commentary.

Questions?

Please use the 'Lesson 4 Discussion Forum' to ask for clarification on any of these concepts and ideas. Hopefully, some of your classmates will be able to help with answering your questions, and I will also provide further commentary there where appropriate. To access the forums, click on 'Discussions' and navigate to the appropriate forum from there.)

Getting the crime data into R

Before getting to writing code, you should make sure that R knows where to look for the shapefile and text file. First, you need to set the working directory. In your case, the working directory is a folder location where the contents of the .zip file were placed. To follow up, make sure that the correct file location is specified. Here's how:

> setwd("C:/PSU/Course_Materials/GEOG_586/Lesson_4/") #note that your folder location will likely be different
> getwd() #retuns the working directory location to make sure the location is correct
> listfiles() #lists the files present in the working directory location

To get started, we need to first get the city limits (i.e., the study area) into R, so that it can be associated with the point data. Here's how:

> library(maptools)
> S <- readShapePoly("city_limits_km.shp")
> SP <- as(S, "SpatialPolygons")
> W <- as(SP, "owin")

In order, this: loads up the maptools package, reads the shapefile into data object S, converts S into a collection of polygons SP, and then converts SP into an 'owin' object, which is the format that spatstat requires so that the data can be used as an analysis window. You can plot W to see what you are dealing with:

> plot(W)

Next, read in the crime data:

> xy <- read.table("StLouisCrime2014.txt", header=T, sep="\t")

You can inspect the contents of xy, by typing xy, and see the names of this dataset's attributes by typing names(xy). Then convert it to a spatstat point pattern object, with the different crime types as an identifying mark:

> attach(xy)
> pp <- ppp(X, Y, window=W, marks=CRIME)

Remember that you will have to load the spatstat library using library(spatstat) before you can use any of its functions. Note that the attach() command above makes the various attributes of the raw data xy available for direct access by name. You can now make a map:

> plot(pp)

To see the three crime types separately:

> plot(split(pp))

We are going to work with each crime as a distinct dataset, so it's convenient to split them permanently:

> gun <- pp[CRIME=="DISORDERLY"]
> rob <- pp[CRIME=="BURGLARY"]
> hit <- pp[CRIME=="HITANDRUN"]

And you can make maps of each individually like this:

> plot(density(gun))
> contour(density(gun), add=T)
> plot(gun, add=T)

Once you're comfortable that you have the data loaded, proceed to the next page.

Project 4: Kernel Density Analysis

The first step in any spatial analysis is to become familiar with your data. In point pattern analysis, kernel density analysis is often used for this purpose; so, first, you are asked to experiment with the kernel density function in spatsat.

Kernel density visualization is performed in spatstat using the density() function which we have already seen in action. The only additional piece of information you need to know is how to vary the bandwidth:

> plot(density(gun, 0.25))

The second parameter in the density function is the bandwidth. R's definition of bandwidth requires some care in its use. Because it is the standard deviation of a Gaussian (i.e., normal) kernel function, it is actually only around 1/2 of the radius across which events will be 'spread' by the kernel function. Remember that the spatial units we are using here are kilometers. It's probably best to add contours to a plot by storing the result of the density analysis:

> d250 <- density(gun, 0.25)
> plot(d250)
> contour(d250, add=T)

and you can also add the points themselves, if you wish:

> plot(gun, add=T)

R provides a function that will suggest an optimal bandwidth to use:

> r <- bw.diggle(gun)
> r

which will tell you the value it has calculated. You can then use this with d_opt <- density(gun, r). You may not feel that the optimal value is optimal. Or you may find it useful to consider what is 'optimal' about this setting.

Deliverable

Create density maps (in R) of the gun homicide data, experimenting with different kernel density bandwidths. Provide a commentary discussing the most suitable bandwidth choice for this analysis visualization method.

 

Project 4: Mean Nearest Neighbor Distance Analysis for the Crime Patterns

For completeness, this page describes how to perform nearest neighbor distance analysis on a point pattern. However, as discussed in the reading, this approach is rarely used now, so there is no need to report findings if you do not think they are useful.

The spatstat nearest neighbor function is nndist.ppp():

> nnd <- nndist.ppp(gun)

which returns a list of all the nearest neighbor distances in the pattern. You can plot these:

> hist(nnd)

and also summarize them:

> summary(nnd)

For a quick statistical assessment, you can also compare the mean value to that expected for an IRP/CSR pattern of the same intensity:

> mnnd <- mean(nnd)
> exp_nnd <- 0.5 / sqrt(gun$n / area.owin(W))
> mnnd / exp_nnd

Give this a try for one or more of the crime patterns. Are they clustered? Or evenly-spaced?

Project 4: Quadrat Analysis in R

Like nearest neighbor distance analysis, quadrat analysis is a relatively limited method for the analysis of a point pattern, as has been discussed in the text.

However, it is easy to perform in R, and can provide useful insight into the distribution of events in a pattern. The functions you need in spatstat are quadratcount() and quadrat.test():

> q <- quadratcount(hit, 4, 8)
> plot(q)
> plot(hit, add=T)
> quadrat.test(hit, 4, 8)

The second and third parameters supplied to these functions are the number of quadrats to create across the study area in the x (east-west) and y (north-south) directions. The test will report a p-value, whose interpretation is discussed in the course text.

Project 4: Distance Based Analysis with Monte Carlo Assessment

The real workhorses of contemporary point pattern analysis are the distance-based functions: G, F, K (and its relative L) and the more recent pair correlation function.

Once again, spatstat provides full support for all of these, using the built-in functions, Gest(), Fest(), Kest(), Lest() and pcf(). In each case, the 'est' suffix refers to the fact the function is an estimate based on the empirical data. Calculation is straightforward:

> g_gun <- Gest(gun)
> plot(g_gun)

When you plot the functions, you will see that spatstat actually provides a number of different estimates of the function. Without getting into the details, the different estimates are based on various possible corrections that can be applied for edge effects.

To make a statistical assessment of any of these functions for our patterns, we need to compare the estimated functions to those we expect to see for IRP/CSR. Given the complexity involved, the easiest way to do this is to calculate the function for a set of simulated realizations of IRP/CSR in the same study area. This is done using the envelope() function:

> g_gun_env <- envelope(gun, Gest, nsim=99, nrank=1)
> plot(g_gun_env)

Figure 4.4 shows an example of the output from a pair correlation function pcf(). Note here that the plots do not show crime data, but illutrate the redwood saplings data we saw earlier.

Pair correlation function for the redwood seedlings data
Figure 4.4. Output using the pcf-function() in R. 
Pair correlation function for the redwood seedlings data

The point pattern, on the left, is clearly clustered. What does the plot on the right show us?

Well, the dashed red line is the theoretical value of the pair correlation function for a pattern generated by IRP/CSR. We aren't much interested in that, except as a point of reference.

The grey region shows us the range of values of the function which occurred across all the simulated realizations of IRP/CSR which you see spatstat producing when you run the envelope function. The black line is the function for the actual pattern (i.e., the redwood seedlings). What we are interested in is whether or not the observed (actual) function lies inside or outside the grey 'envelope'. In this case, the observed function is outside the envelope over the range of distances (on the x-axis) from around 0.01 to around 0.07.

As this is the pair correlation function in this case, this tells us that there are more pairs of events at this range of spacings from one another than we would expect to occur by chance. Over the rest of the range of values shown here, the PCF falls within the expected bounds (except for a minor departure below expected values at around 0.225). This observation supports the view that the pattern is clustered or aggregated at the stated range of distances.

The exact Interpretation of the relationship between the envelope and the observed function is dependent on the function in question, but this should give you the idea.

One thing to watch out for... you may find that it's rather tedious waiting for 99 simulated patterns each time you run the envelope() function. This is the default number that are run. You can change this by specifying a different value for nsim:

> K_e <- envelope(rob, Kest, nsim=19, nrank=1)

Once you are sure what examples you want to use, you will probably want to do a final run with nsim set to 99, so that you have more faith in the envelope generated (since it is based on more realizations and more likely to be stable). Also, you can change the rank setting. This will mean that the 'hi' and 'lo' lines in the plot will be placed at the corresponding low or high values in the range produced by the simulated realizations of IRP/CSR. So, for example:

> G_e <- envelope(hit, Gest, nsim=99, nrank=5)

will run 99 simulations of and place high and low limits on the envelope at the 5th highest and 5th lowest values in the set of simulated patterns.

Something worth knowing is that the L function implemented in R deviates from that discussed in the text, in that it produces a result whose expected behavior for CSR is a upward-right sloping line at 45 degrees, that is expected L(r) = r, this can be confusing if you are not expecting it.

One final (minor) point: for the pair correlation function in particular, the values at short distances can be very high and R will scale the plot to include all the values, making it very hard to see the interesting part of the plot. To control the range of values displayed in a plot use xlim and ylim. For example:

> plot(pcf_e, ylim=c(0, 5))

will ensure that only the range between 0 and 5 is plotted on the y-axis.

Got all that? If you do have questions - as usual, you should post them to the Discussion Forum for this week's project. Also go to the additional resources at the end of this lesson where I have included links to some articles that use some of these methods.

Deliverable

Perform point pattern analysis on two of the three crime datasets (preferably contrasting ones) by using whatever methods seem the most useful, and present your findings in the form of maps, plots, and accompanying commentary.

 

Project 4: Finishing Up

Please put your write-up, or a link to your write-up, in the Project 4 Drop Box.

Deliverables

For Project 4, the items you are required to submit are as follows:

  • Create density maps of the gun homicide data, experimenting with the kernel density bandwidth, and provide a commentary discussing the most suitable bandwidth choice for this analysis visualization method.
  • Perform point pattern analysis on two of the three crime datasets (preferably contrasting ones) by using whatever methods seem the most useful, and present your findings in the form of plots and accompanying commentary.

That's it for Project 4!

I suggest that you review "Final Activities for Lesson 4" to be sure you have completed all the required work for Lesson 4.

Afterword

Now that you are finished with this week's project, you may be interested to know that some of the tools you've been using are available in ArcGIS. You will find mean nearest neighbor distance and Ripley's K tools in the Spatial Statistics - Analyzing Patterns toolbox. The Ripley's K tool in particular has improved significantly in ArcGIS 10, so that it now includes the ability to generate confidence envelopes using simulation just like the envelope() function in R.

For kernel density surfaces, there is a density estimation tool in the Spatial Analyst Tools - Density toolbox. This is essentially the same as the density() tool in R with one very significant difference, namely that Arc does not correct for edge effects. In the figure below, the results of kernel density analysis applied to all the crime events in the project data set are shown for (from left to right) the default settings in Arc, with a mask and processing extent set in Arc to cover the city limits area, and for R.

Figure 4.5. Three KDE using the default settings on the three crime data sets.
Figure 4.5 shows the results of kernel density analysis applied to all the crime events in the project data set are shown for (from left to right) the default settings in Arc, with a mask and processing extent set in Arc to cover the city limits area, and for R.

The search radius in Arc was set to 2km and the 'sigma' parameter in R was set to 1km - these should give roughly equivalent results. More significant than the exact shape of the results is that R is correcting for edge effects. This is most clear at the north end of the map, where R's output implies that the region of higher density runs off the edge of the study area, while Arc confines it to the analysis area. R accomplishes this by basing its density estimate on the area inside the study area at each location.

The extensibility of both packages makes it to some extent a matter of taste which you choose to use for point pattern analysis. At the time of writing (2010), it is clear that R remains the better choice in terms of the range of available options and tools, although Arc may have the edge in terms of its familiarity to GIS analysts. For users starting with limited knowledge of both tools, it is debatable which has the steeper learning curve - certainly neither is simple to use!

Additional Resources L4

To see how some of these methods are applied have a quick look at some of these journal articles.

Here is an article to an MGIS capstone project that investigated sinkholes in Florida. Related to crime, here is a link to an article that uses spatial analysis for understanding crime in national forests and the poaching of elephants.

Here are some articles that use some of the methods learned during this week's lesson to analyze the distribution of plants (e.g., acacia and ferns) and butterflies.

For a comprehensive read on using crime analysis, look through Crime Modeling and Mapping Using Geospatial Technologies book available through the Penn State Library.

Don't forget to use the library and search for other books that may be applicable to your studies.