Interpolation is a process that uses measurements that we have made about some phenomenon (e.g., precipitation, elevation or mineral content) at particular locations (i.e., a sample of all of the possible locations where we could have made measurements) to make a prediction about that phenomenon at other locations where we have not made measurements. There are many reasons that you may only have a sample of data to work with. One common reason is the cost (both in time and money) involved in making measurements. For example, there are a limited number of weather stations that make the meteorological measurements that are used for predicting future weather patterns. Although this data collection is mechanized today, in earlier times, a person had to physically record values for the different measurements, and data collection was limited because of this. Although we only have data for particular points, we would like to be able to predict the weather for all locations within a region, not just for those points. Hence the necessity of using some method of interpolation so that we have data to use in our predictive models for all locations.
The basic concept behind interpolation is expressed in Tobler's (1970) first law of geography which states: "All places are related, but nearby places are more related than distant places." In other words, we should expect places that are near our sample locations to have values that are more similar to the sample values than places that are not near the sample locations. In the remainder of this concept gallery item, we will focus on describing some of the different types of interpolation that you might come across: interpolation from points to other points, lines, areas and surfaces.
Interpolation to points and lines often uses a simple method of interpolation called linear interpolation. This method was commonly used by cartographers for creating isoline maps before mapping became computerized, and is still used by cartographers (with computers) to create isolines from data surfaces in secondary interpolation. Secondary interpolation is the process of producing isolines from a data surface. (The surfaces themselves are usually created through primary interpolation, using other methods of interpolation that we will discuss in more detail later in this concept gallery item.)
The basic idea behind linear interpolation can be illustrated with a simple mathematical example. It is based on the idea that we can use a sequence of values to predict the values at locations where we don't have a value. For example if you saw the following sequence, you would probably quite easily be able to say that the missing values should be 30 and 80.
Although linear interpolation methods were first developed for analyzing time series data, we can apply the same principles in a spatial context by considering the distance between two points. In the example below, imagine that you are creating an isoline map of elevation (see Isolines for a refresher on isolines) and need to draw the 30 meter isoline. If we assume that the elevation changes linearly (i.e., in a consistent fashion) from point A to point B, the 30 meter isoline should pass through a point 6/9 (or 2/3) of the way between point A and point B.
One of the problems that may occur with linear interpolations is known as the saddle point or alternative choice problem. In this case, if there are two pairs of diagonally opposite values (as in the corners of a rectangle), and the values of both members of one diagonal lie above and both members of the other diagonal lie below the value you are trying to interpolate to (i.e., the value of the contour line you are trying to draw), there are two possible solutions to the problem (see Figure 6.cg.21, below).
Interpolation to areas uses the concept of Thiessen polygons to draw boundaries around areas that have the same value. This type of interpolator is called a proximal interpolator. Thiessen polygons are also used as the basis for creating TINs (see Triangulated Irregular Network (TIN) for more detail). This technique builds polygons by assigning each point in an area of interest the value of the sample point to which it is nearest. In other words, the boundaries of the created polygons are lines of equal distance between two sample points (see figure below). Although this procedure creates a space-filling representation in that each point in the area we are interested in is assigned to a polygon, it creates a stepped surface, where values can change dramatically over short distances (i.e., one whose values do not vary smoothly throughout space).
Interpolation to surfaces can also create continuous, smoothly varying surfaces that we might like to use to represent phenomena such as elevation, precipitation and temperature. In this concept gallery item, we will briefly discuss these methods in general, leaving detailed discussion of two of the more commonly used methods (inverse distance weighting and kriging) to their own concept gallery items.
We can classify interpolation methods as being global or local interpolators. In global interpolation methods, all of the sample points are used to determine the shape of some mathematical function that is applied to the whole area of interest. One such method is trend surface analysis where one plane is fit to the data sample (covered in detail in the O'Sullivan and Unwin reference). Other interpolators can be classified as local interpolators. In other words, this type of interpolation method only uses sample points that are relatively close to unknown point to estimate the unknown point's value. Local interpolators can be parameterized in various ways, by considering some number of nearest points or all points within some distance. Of course, all local interpolators can become, in effect, global interpolators if the parameters are adjusted so that the neighborhood is large enough to include all sample points. Both kriging and inverse distance weighting are examples of local interpolators.
Another type of distinction we can make between different types of interpolation methods is between exact and approximate interpolators. Exact interpolators 'honor' the sample data (e.g., as the proximal interpolator does). In other words, at the location of a sample point, the interpolated surface has the same value that the original sample point had. In approximate interpolators, there is a recognition that due to measurement and other types of error, the best fit surface may not pass directly through all of the sample points. However, the value of an approximate surface at a sample point will be close to the sample value.
A final type of distinction between interpolators that we can make is that of deterministic versus stochastic interpolators. Deterministic interpolators use a mathematical formula to calculate the value of an unsampled location (e.g., the inverse distance weighting method), while stochastic interpolators use statistical information about the sample point data values and their spatial arrangement (i.e., their spatial structure) to predict the value of an unsampled location, and the spatial arrangement of values over a range of unsampled locations. Two examples of stochastic interpolation methods are kriging and trend surface analysis.
Regardless of the type of interpolation method you may choose to use, there are a number of factors that can affect the quality of the interpolated outputs:
- Number of sample points
- Location of sample points
- Edge effects
Generally, the greater the number of sample points you have, the more accurate your interpolated surface will be, as the set of location is more likely to include locations whose values are important for defining the surface (e.g., local peaks and valleys). However, there is also a tradeoff between the number of samples you have and the amount of time that the computer needs to process that information.
Similarly, the location of sample points can have an important impact on the end result of the interpolation. Often, samples are not evenly distributed over the region of interest, and may be biased to places where data collection is relatively easy (see Sampling Strategies for more information on sampling). If there are no samples in a region of high variability, the interpolated surface may not be very accurate.
Finally, edge effects can be quite important. Edge effects arise when there are no sample points to one side of an non-sampled region. This lack of samples may bias the estimate that the interpolation method makes of an non-sampled region, leading to large inaccuracies. In other words, the interpolation method is no longer interpolating (predicting missing values within a region), but is now extrapolating (predicting values in areas where there is no sample data). Fortunately, there is an easy solution to this problem: always make sure that you have sample points that are just outside the region for which you want to create the interpolated surface, and then clip the area you are interested in out of the results. This method will relegate most of the inaccuracy to the area outside of the sample points, leaving you with a more accurate result within your region of interest.
If you are interested in investigating this subject further, I recommend the following:
- O'Sullivan, D. and D. Unwin. 2010. Geographic Information Analysis, 2nd Edition. Hoboken, NJ: John Wiley & Sons.
- Lam, N. 1983. "Spatial Interpolation Methods: A Review," The American Cartographer. 10(2):129-149.
- Mitas, L., and H. Mitasova. 1999. "Spatial Interpolation." In: P.Longley, M.F. Goodchild, D.J. Maguire, D.W. Rhind (Eds.), Geographical Information Systems: Principles, Techniques, Management and Applications, GeoInformation International, Wiley, 481-492.