Data uncertainty is a term we use to describe the level of confidence that a user has in his or her data. There are many reasons that you might not be 100% confident that your data reflect what is really happening on the ground out in the real world. Before we look at those reasons in more detail, it is useful to look at two aspects of uncertainty: accuracy and precision.
Accuracy refers to the degree to which a measured value approaches a true value. Precision can be understood in two contexts: (1) as a measure of how dispersed measured values are around the mean value for that group of measurements; or (2) the resolution of the data (i.e. the smallest measurement difference that you can record with that particular measurement method or tool).
Let's take a target figure as an example. Now pretend that you have four dart throwers who have three attempts each to get as close as they can to the center, or bullseye, of the target. Thrower 1's darts (shown in red) all land very close to the center of the bullseye. She is both accurate and precise. Thrower 2's darts (shown in green) all land very close together, but not anywhere near the bullseye. He is precise, but not accurate. Thrower 3's darts (shown in blue) all land near the center, but not close together. He is accurate, but not very precise. Finally, thrower 4's darts (shown in yellow) land all over the target and not anywhere close to each other. She is neither accurate nor precise.
In the real world, you might be faced with the choice of deciding how to measure some phenomenon you are interested in (i.e. deciding which dart thrower to choose for your team). Although the measurement methods that you have to choose from may not be as different in their accuracy and precision as the dart throwers in the above figure, you will still want to choose the method that gives you the highest level of accuracy and precision that you can get (subject to other constraints in your project, such as cost). So if you are able to direct data collection yourself, you might have a higher degree of confidence in your data. Often, however, you will be using data that were collected by someone else, and it's up to you to make an estimate of how reliable the data are. In the remainder of this concept gallery, we will talk about various types of data uncertainty and the sources of that uncertainty.
Generally, we can think about three types of uncertainty for a given data set: thematic, positional (spatial) and temporal uncertainty (Buttenfield and Beard 1994).
There are many sources of uncertainty in data. Here, we list a few examples:
- Perception (of a human data collector) or observer bias. In other words, two measurers might set out to describe the same object or phenomenon (and possibly even use the same instrument), but still come up with different measurements to describe the object or phenomenon.
- Accuracy of the measurement device. Some measurement methods are more accurate than others (e.g. a GPS receiver with differential correction is more accurate than one without).
- Precision of the measuring device. This often has to do with how well the instrument is calibrated.
- Data processing. Operations such as vectorization, rasterization and calculations (i.e. deriving a data measure from raw data) can introduce uncertainty into the data.
- Variation of the phenomenon itself. Some phenomena (e.g. buildings) are well-defined and it is easy for the data collector to tell where the phenomenon begins and ends. Others, such as soil types or vegetation types are fuzzier, and are more difficult to describe with a high degree of accuracy and precision.
If you are interested in investigating this subject further, I recommend the following:
- Zhang, J.X. and Goodchild, M.F. (2002). Uncertainty in geographical information. New York: Taylor and Francis.
- Goodchild, M.F. (2000). "Introduction: special issue on Uncertainty in geographic information systems" Fuzzy Sets and Systems. 113, p. 3-5.