This lesson focuses on principles and methods used to determine quality and accuracy of lidar data and derived GIS products. The readings will introduce numerous terms and definitions, which may differ slightly from selection to selection and in some cases may even seem contradictory. The Introduction page of the online content will provide some additional explanation and clarification of these issues.
Digital terrain models have been around for some time; however, the advent of lidar technology prompted new interest in accuracy and quality reporting. A number of guidelines and standards, used by public agencies and private entities throughout the United States, will be presented in the textbook readings and summarized in the online course material. In this course, we have not been able to study all of the detailed elements of lidar technology and data addressed by these mapping standards; therefore, it’s important to try to get an overall sense of the major categories of issues being addressed.
In the lab, the student will perform a quantitative accuracy assessment consistent with industry guidelines. Finally, the lesson will present examples of accuracy and quality assessment reports that are typical of most topographic lidar mapping projects.
After completing this lesson you should be able to:
Lesson 6 concludes the technology-focused section of this course with an overview of accuracy assessment procedures and standards applied to lidar datasets intended for public use in the United States. In the context of this lesson, accuracy assessment refers to the spatial correctness of the dataset compared to the surface it represents. Quantitative assessment deals primarily with geometric fidelity of the dataset, the degree to which the spatial coordinates of a data element agree with their “true” coordinates on the earth’s surface, and involves mathematical comparison of the surface to known ground points.
For most mapping products, there are also aesthetic standards for elements of quality which may not affect spatial accuracy, but can either enhance or detract from the experience of viewing the data or using it for analysis. Qualitative assessment of the correctness of the surface involves visual inspection. Together, the quantitative and qualitative evaluations are referred to by the broader terms, quality assurance and quality control (QA/QC).
The terms, quality control and quality assurance, are often used somewhat interchangeably, or in tandem, to refer to a multitude of tasks performed internally by the data producer and externally, or independently, by the data purchaser. We will adhere to the following definitions adapted by the course author from sources outside the mapping community (IPCC, 2000).
Quality Control (QC) is a system of routine technical activities, to measure and control the quality of the product as it is being developed. The QC system is designed to:
- Provide routine and consistent checks to ensure data integrity, correctness, and completeness;
- Identify and address errors and omissions;
- Document and archive production material and record all QC activities.
Quality Assurance (QA) activities include a planned system of review procedures conducted by personnel not directly involved in the inventory compilation/development process. Reviews, preferably by independent third parties, should be performed upon a finalized product following the implementation of QC procedures. Reviews verify that data quality objectives were met,and report metrics of interest to end users of the data for additional analyses.
Chapter 12 of Maune (2007) has a different set of definitions for these terms. In this reference, the description of QA fits the definition of QC given above. The description of QC given in Maune can be seen also to fit within the definition of QC given above. And finally, the description of Independent QA/QC fits the definition of Quality Assurance given above. Regardless of the terms applied, the student should be able to conceptualize, categorize, and differentiate the activities involved, and given the inconsistencies that exist in application of the terms, should be able to adapt a discussion of these activities to the accepted vocabulary of a particular project group or agency. In other words, all these steps need to occur regardless of what they are called, and the group planning and reviewing a project simply needs to agree on the usage of terms at the outset.
In preceding lessons, we have touched upon procedures for checking the quality and accuracy of interim work products. In the data acquisition phase, the lidar data is checked for completeness of coverage and the GPS/IMU data processing residuals are checked to ensure the quality of the georeferencing. In the preprocessing and calibration phase, overlapping lidar strips are examined in order to refine boresight angles and ground control is used to check the conversion of ellipsoidal heights to the project vertical datum. During the editing phase, visual inspection is performed to ensure that points are correctly classified; ancillary imagery are used to identify structures and water bodies that may pass through automated filtering undetected. All of these activities constitute internal quality control. In a full-scale production operation, a system for documenting these procedures and archiving interim results would be part of a comprehensive Quality Plan adhered to by all project staff.
Quantitative accuracy assessment (testing lidar-derived surfaces against surveyed ground checkpoints) is normally conducted by an individual or organization that had no involvement in the data acquisition or production. Data producers may well use their own ground checkpoints as part of their internal quality control, but the ground check points used to generate the officially published accuracy assessment report are usually not made available to the data producer. The final coordinate comparison is truly an independent test of spatial accuracy. According to the definitions set forth above, this independent activity falls under Quality Assurance. In addition to a quantitative report on positional accuracy, an independent reviewer will perform visual inspection of the classified lidar point cloud and derivative products that may be contract deliverables. A separate qualitative assessment report is normally provided as part of the published project record. Examples of both quantitative and qualitative assessment reports for the dataset used in the lab activities will be provided later in this lesson.
As discussed above, the QA/QC process gives us insight into the types of errors and artifacts that affect terrain data, due either to the sensor or to characteristics of the target surface. With respect to the lidar, there was concern from the start that vertical accuracy would vary within a single dataset, based on the type of terrain and land cover being mapped. In other words, there was some recognition that accuracy itself was a spatial variable. While this is undoubtedly true of most spatial datasets, including orthorectified imagery, the discussion and debate about methods of accuracy assessment and reporting have been highly focused on terrain, and it’s safe to say that there will be significant refinements and developments occurring in the next decade.
IPCC (2000). Good Practice Guidance and Uncertainty Management in National Greenhouse Gas Inventories, 16th IPCC Plenary Session, Montreal, 1-8 May, 2000. http://www.ipcc-nggip.iges.or.jp/public/gp/english/8_QA-QC.pdf. Last accessed 10 November 2009.
Quality control and assurance procedures for terrain models comprise three categories:
Terrain data come in many different forms (DEM, DTM, DSM, TIN, breaklines, etc.) and formats. It is important to ensure that the contract specifies these clearly before production begins, as transforming from one format to another after production is time-consuming and may introduce undesirable interpolation errors into the data itself. It is best to provide the user with a small sample area as soon as possible, before beginning full production, to allow them the chance to use the sample data on their own systems and with their own software.
Ground checkpoints are usually required to be at least three times more accurate than the data to be tested. The root-mean-square error (RMSE) as calculated between the dataset and the checkpoints is converted into a statement of vertical accuracy at an established confidence level, normally 95 percent. Because elevation is a one-dimensional variable, the 95% confidence level is equivalent to the RMSE multiplied by 1.96. A NSSDA-compliant accuracy statement accompanying a terrain model deliverable would be “Tested ____ (meters, feet) vertical accuracy at 95% confidence level”, and the numerical value supplied is RMSE * 1.9600. This statement of accuracy assumes that no systematic errors or biases are present in the data and that the individual checkpoint errors follow a normal distribution.
One of the biggest potential customers for terrain data in the United States is FEMA, in particular the national floodplain mapping program. As topographic lidar was emerging as a powerful terrain mapping tool in the mid to late 1990’s, one of FEMA’s most pressing questions was “how does it perform in the different land cover types that characterize the floodplain?” This question and FEMA’s potential need for accurate elevation data nationwide drove the development of guidelines and specifications for lidar acquisition, processing, QA/QC, and accuracy testing. The FEMA guidelines required testing and reporting against independent check points in representative land cover types. The most common land cover types identified for terrain model accuracy assessment purposes are: open ground, weeds and crops, scrub and shrub, forest and urban.
The FEMA guidelines are presented in more depth later in the lesson. For the moment, it is relevant to point out that the early testing of lidar data, according to these guidelines, pointed out several important facts that affect our approach to quantitative accuracy assessment of terrain data. First and foremost, it was discovered that errors in lidar-derived terrain datasets do not follow a normal distribution, except over bare ground. In areas covered by any sort of vegetation, the tendency will be for lidar (and for radar as well) to yield elevations above the ground due to returns off the canopy. In built-up areas, there will be many lidar returns on objects above the ground, which may not all be removed from the bare earth terrain model, again causing an asymmetric error distribution with more above-ground errors than below-ground errors. On the contrary, lidar tends to measure elevations a bit below the ground on the dark asphalt surfaces that are common to roadways and urban areas. When one begins to study the error distribution for an entire dataset in detail, it is obvious that accuracy not only varies within the dataset due to variation in land cover, but it also deviates from a normal error distribution in particular ways depending on the slope, roughness, and composition of the surface. One can easily assume that radar will have its own set of similar issues.
In recognition of the fact that errors in lidar-derived terrain models are often not appropriately modeled by a Gaussian distribution, a nonparametric testing method, based on the 95th percentile, was proposed and implemented in the National Digital Elevation Program Guidelines. According to these guidelines (which are the currently-accepted working standard for most lidar projects in the US, including those conducted for FEMA) fundamental vertical accuracy is measured in bare, open terrain and reported at the 95% confidence level as a function of vertical RMSE; in other land cover types, the supplemental or consolidated vertical accuracy is measured and reported according to the 95th percentile method. Both Maune (2007) and the NDEP Guidelines give detailed instructions for the computation of these quantities. Links to those documents are provided on page 8 of this lesson.
A sample vertical accuracy assessment report , compiled by an independent contractor for the Pennsylvania statewide lidar program, PAMAP, illustrates the calculation and reporting of quantitative accuracy assessment results.
The final step in product acceptance is the qualitative assessment. Various 3D visualization techniques are used to view the terrain surface and examine it for artifacts, stray vegetation or buildings and the like. Water bodies tend to pose special problems and generally require some sort of manual editing during data production, so lakes, rivers and shorelines should be examined to ensure that as an elevation surface they are represented as being flat. The elevation used over water bodies is almost never an accurate representation of the height of the water surface in reality, because most remote sensing techniques do not directly measure water heights reliably. In a terrain model product, the elevation of a water body is usually filled in using the mean elevation of the shoreline.
Breaklines are normally a supplemental deliverable accompanying another type of terrain model (DEM, DTM or DSM). The most common way to assess the quality and accuracy of breaklines is superimposition on the terrain model in a 3-dimensional view. Contours are usually generated from another type of terrain model, so they are usually not checked directly for vertical accuracy. They should be checked to ensure that they do not cross, touch or contain gaps.
A sample QA/QC report , compiled by an independent contractor for the Pennsylvania statewide lidar program, PAMAP, provides many good examples of the types of artifacts found in visual inspection of a lidar-derived terrain dataset. It is difficult to automate identification and correction of these artifacts; therefore, the independent review and final data editing is usually an interactive process involving the data producer, the independent reviewer, and the data purchaser.
The National Map Accuracy Standards (NMAS) were established in 1947 by the U.S. Bureau of the Budget. They present a pass/fail criterion; a map either meets NMAS or it does not. The horizontal and vertical accuracy requirements pertain to printed maps with a published horizontal scale and vertical contour interval:
The statement “not more than 10 percent tested shall be in error” is equivalent to “no less than 90 percent of the points tested shall be in accurate," and in fact, the latter statement is more commonly used by practitioners. It is a statement of accuracy at the 90% confidence level, rather than at the 95% level used by more modern standards, but it is predicated on the same statistical theory and assumptions.
NMAS are the accuracy requirements used for the USGS topographic map series, one of the most important civilian national mapping programs in the United States. Even though the more recent USGS DOQQ program was intended to create digital product, the program specifications were designed around a 1:12000 scale quarter quadrangle map; 1/30 of an inch at 1:12000 scale equals 33.3 feet or 10 meters at the 90% confidence level. Do those numbers sound familiar? If not, look back at the top of page 5 of this lesson.
NMAS are not appropriate for evaluating and reporting the vertical accuracy of digital data that can be displayed and analyzed in GIS at virtually any scale. The user community has completely abandoned the horizontal accuracy criterion, as it has so little practical relevance, and frankly, was difficult for most people to remember correctly. However, contours are still so intuitively attractive and historically ingrained, particularly in the engineering community, that the habit of defining data requirements and describing elevation products with outdated NMAS language related to contours has stubbornly persisted, in spite of its irrelevance to modern terrain data.
As a first response to the need for scale-independent accuracy standards, the ASPRS standards explicitly used the statistical term, RMS (aka RMSE), and described a method of testing and reporting that related this more modern statistical language to map classes and contour intervals, as described in Maune (2007). The ASPRS standards are not used widely today, and in fact, the link will only take you to a page describing them, rather than to an official document. The collaboration undertaken within the community to create the ASPRS standards led to further development of a seminal set of standards by the Federal Geographic Data Committee (FDGC). While they may have faded into a less important position in today’s mapping world, the ASPRS standards provided the spark and facilitated a critical transition from NMAS to the NSSDA standards published by FDGC in 1998.
The NSSDA standards define quantitative accuracy assessment, horizontal and vertical, in today’s modern, digital mapping world. They are founded on the assumption that errors are normally distributed, and they prescribe the formal language for accuracy reporting which refers to statistical measures of error and confidence level. They are well described in Maune (2007), and there’s no need to create redundant reading here. Spend the time to carefully read and fully understand pages 74-76 of Chapter 3.
The table below shows the relationship between the intuitive and familiar NMAS and Vertical Map Accuracy Standard (VMAS) language for equivalent contour interval and the statistically based NSSDA standards. This is a good table to keep handy, because people still frequently use contour interval as a description and contract specification for terrain data accuracy.
Equivalent Contour Interval
|1 ft||0.5 ft||0.30 ft or 9.25 cm||0.60 ft or 18.2 cm|
|2 ft||1.0 ft||0.61 ft or 18.5 cm||1.19 ft or 36.3 cm|
|4 ft||2.0 ft||1.22 ft or 37.0 cm||2.38 ft or 72.6 cm|
|5 ft||2.5 ft||1.52 ft or 46.3 cm||2.98 ft or 90.8 cm|
|10 ft||5.0 ft||3.04 ft or 92.7 cm||5.96 ft or 181.6 cm|
|20 ft||10.0 ft||6.08 ft or 185.3 cm||11.92 ft or 363.2 cm|
The NDEP Guidelines use the 2003 FEMA Guidelines for Lidar and the 2004 ASPRS Guidelines for Lidar as input to restate vertical and horizontal accuracy assessment and reporting requirements for digital elevation data in a form that can be applied to any terrain dataset, regardless of the sensor used to collect it. The biggest differences between the NDEP Guidelines and the NSSDA Standards is the categorization of fundamental, supplemental, and consolidated accuracy and the addition of a new statistical method, the 95th percentile, for computation. These were discussed in some detail on page 6 of this lesson, and they are covered in even greater detail in Maune (2007).
Future Standards Development Efforts
Complete characterization of elevation data accuracy requires more than one statistical measurement for an entire dataset. We know that accuracy for lidar and IFSAR varies within the acquisition swath due to sensor parameters. We know that the surface material, roughness, slope, and land cover also affect the terrain model. These variables vary spatially within a single dataset, yet our accuracy assessment methodologies are not designed to communicate this spatial variability to the end user. The need for more robust accuracy assessment and reporting methods for terrain data has been identified by the National Academies of Science as an unmet need with respect to elevation data for floodplain mapping (NAS, 2007). The topic is being discussed in a number of national forums, so expect there to be new guidelines and standards emerging in the next several years.
This activity should be worked on after you have completed all of the online and textbook readings for this lesson. We recommend you take the following approach:
Again searching publicly available lidar data from any source of your choosing, comment in the Lesson 7 Graded Discussion Forum on the way vertical accuracy and other quality measures have been documented by the owner of the data. Compare the published metadata and reports to the requirements set forth in the USGS Base Specification. Do you feel that the published metadata is adequate and complete with respect a what a GIS user would/should need to know before performing geospatial analysis using this data.
Write two short peer reviews of the final project proposal, one for the student immediately preceding you in the roster and one for the student immediately following you. If one of these students has not posted a deliverable, just choose any other student in the class to peer revew. Post your peer reviews as replies to the appropriate students' idea threads in the Final Project Proposal Discussion. Comment on the completeness of the proposal per the published requirements, and offer suggestions for improving the project for the final report.
If you have anything you'd like to comment on or add to the lesson materials, feel free to post your thoughts below. For example, what did you have the most trouble with in this lesson? Was there anything useful here that you'd like to try in your own workplace?
Don't see the "Comment" area below? You need to be logged in to this site first! Do so by using the link at the top of the left-hand menu bar. Once you have logged in, you may need to refresh the page in order to see the comment area below.