Click here for video transcript.
PRESENTER: Howdy. In this tutorial, we will be creating a confusion matrix in Arc Map to assess the accuracy of an image classification. For instructions on the classification process, please see one of our image classification tutorials.
Thematic maps created from imagery will have some classification errors. Accuracy assessments provide the user with more information on where the errors are occurring. Depending on the acceptable level of error, the user will be able to determine if their classification is usable or if they need to reclassify the image.
The class accuracies are determined by comparing test pixels with the corresponding location in the classified image. In a perfect world, we would be able to use field-verified ground reference locations for the test pixels. This is not always possible, in which case the user may also select references that they have visually identified from the imagery. The test pixel should be evenly distributed across the image. They should also be distinct pixels from the training areas used for supervised classifications.
Confusion matrices are a widely-accepted method of determining the accuracy of a classification. But it is important for the user to remember that the biases that are present in their test pixels will also bias the accuracy of their confusion matrix. The rule of thumb that I learned was to have 10 times the number of pixels for each class as there are classes. So if there are three land cover classes, then there should be 30 test pixels for each land cover, so a total of 90 test pixels. It may not always be possible to have an equal number of pixels for each class in a classification.
If you know that there is not very much forest in your classification and there is a large amount of water, it would make more sense to have 20 test pixels for the forest and 40 for the water. The test pixels need to be as near to evenly distributed as possible. If all of your test pixels are from one section of the image, the result will be biased to only the accuracy of the section of the image that you had test pixels for.
In arc map, open your pre-classification image. On the top of the map, open the Arc catalog interface. Navigate to the file that you wish to work in or create one. I have created a file for my accuracy assessment.
Now we will create a new shapefile. I'm going to name mine reference points. Make sure that the type of shapefile is set to point.
Now I'm going to drag my new shapefile into the map. Reference points is currently empty. Before I add points to the shapefile, I'm going to add two fields to the attribute table.
Open the attribute table by right-clicking on the shapefile and selecting attribute table. We will left-click on the button on the upper left and then select add field. The first field I am adding will be the reference for me and any other user of this data set.
The data type will be text. And I will name it land cover. I'm going to repeat the process of adding a column, but this time I'm going to name it class and leave it as a short integer.
In order to add a point, I am going to use the edit function. To use this, open the editor toolbar by right-clicking at the top of the map and selecting the editor toolbar. Left click on the arrow next to the word editor and select start editing.
Next, we will select reference points shape file to open an editor window. Select the reference points file in the editor box, which will open options in the construction tool. Select points. You will now be able to add points.
When adding points, be sure that you are using the image that you used to create the classified image. Be zoomed in sufficiently that you are able to see where you are placing the point. Before I start placing points, I need to determine how many land covers I can identify in the image.
I am noticing four. Trees, pasture, water, and urban areas. Based on the four classes and the rule of thumb to have 10 times the number of test points for each class as there are classes, I will select 40 points for each class and will end up with a total of 160 points.
For the moment, I am just going to select a few points for the water class. Once you have selected your points for one class, you will need to add information to the attribute table so that you know what the points represent and so that the computer knows. This is why we added both a text and a number column.
You may not know what the class number is now because you should not have your classified image open, as it leads to the temptation to select points that match your classified image. When adding the information to the attribute table, it is easiest to use the select by attribute tool and choose land cover from the dialog box. Now it will equal the unique value of the quotation-marked space.
The attribute table for the reference points will now have all of the blank fields highlighted and I can input the value of water by using the field calculator. Because water is text, I need to put quotes around the word water. Now you can start adding points to the next class. These will be blank in the attribute table, and you can fill in their values the same way that water's value was infilled.
I'm going to skip ahead here and open up a point file that I have already completed adding all of my reference points to. After completing all of your points, add the classified image to your map document. Check what the number code is for your classes, and then assign them to the point file so that class one is the same in both.
In my case, class one is pasture. So I will go to selection, select by attributes, and enter land cover and equal on pasture and hit OK. The attribute table for the reference points will now have all of the pasture highlighted and I can input the values, the value one, by using the field calculator. I have already input the numbers into my file.
There is one major problem with my data that is now apparent based on the classes that I observed and the classes that the computer observed. There is no urban class in my classified image. At this point, I have two options. I can stop and perform the classification over again, or I may go ahead and run the accuracy assessment so that I know what the urban areas are being classified as so that I may pay more attention to them when selecting my training data if I am using a supervised classification method.
To continue with our accuracy assessment, we need to set the system to align the pixels we will create from the reference points with the pixels of the classification. To do this, go to the reprocessing tab, environments, processing extent. Fill in the extent as your classified raster and set the snap raster to the classified raster as well. Now hit OK.
If you do not have the spatial analyst extension on, let's do that at this time. Go to customize, extensions, and select spatial analyst. Now we will convert the reference points into reference pixels. Open the toolbox and go to conversion tools, then to raster, and finally point to raster.
The input feature is the reference points. The value field will be class. And then browse to the location you would like to save the output raster file and name it.
You can leave the cell assignment and priority fields in their default position, but make sure that the cell size is correct. Mine should be 30 because it is based on a land sat image. If you do not know what size it should be, you may also drag the classified raster to the cell size and the computer will match it. Hit OK.
Zoom in to one of your points to see if it appears to be aligned correctly. I'm going to zoom into a river because this is a small object and it is easier to tell if the pixel is the same size when it is next to multiple classes. Mine looks good. If yours has issues, revisit the environments and cell size settings.
We will combine the reference points and classified image. Go to the toolbox, spatial analyst, local, and combine.
Looking at the results of the combine is interesting. If it were not for the urban class, I would say this was a very successful classification. Class three has 39 out of 40 points correctly classed, and is missing-- the missing point isn't showing up in any of the other classes, which means it was an unclassified pixel. Class one and two were 100% placed. My urban class was nearly equally placed in pasture and forest.
Although this has told us a lot of information, it is still not a confusion matrix. To create a confusion matrix we will need to use the pivot tool. But this cannot be done with the attribute table so we will have to export the table. Click the button on the upper left corner and select export and then browse to the location that you wish the table to be saved in. Save the table as a d-base table.
The pivot table tool may be found under the data management, tables, and then the pivot table tool should be there. The input table should be the table you just exported. Select the classified raster field as your input and the reference points as the pivot field. The value field will be count.
Now navigate to the location you wish to save your table and name it. Then hit OK. My matrix is in an order that makes it more difficult to read, but that's OK for now. I'm going to use Excel from this point forward. This may not be the best way, but I like to export my table as a text file.
Then in Excel, I will open the text file and select the delineated. Hit next, and then the file is comma delineated, so check that box. And then next and next. I'm going to get rid of the object ID field. Next, I'm going to rearrange my columns so that they are in one, two, three, four order.
I'm going to add a class, four rowed, for my urban class and input zeros for all four columns. To make it easier to interpret, I'm going to change my classes from numbers into land cover names. In my case, that is pasture, forest, water, and urban.
So now the matrix is looking pretty and it is time to add the formula so that we can get meaningful numbers out of the data. The measurements that we will be finding are the kappa coefficient, the overall accuracy, the class accuracies, the commission, and the omission. I'm going to insert several rows above my matrix.
I'm going to find the total number of pixels for each class. I can do this easily by using the formula equal sum, then select the unit cells that I want. And then drag this formula into the neighboring cells.
I am going to find the ground truth totals in the same manner. I know that I had 40 points in water, so I'm going to manually correct the total for the water class. Next, we will find the ground truth percent. These are the class accuracies.
These will have the same headers as the matrix. So I'm going to copy my matrix and then delete the pixel data from my copy. In my first cell, I will input the formula equal my pasture cell divided by the pasture reference total multiplied by 100. Add a dollar sign in between the letter and the number of the reference total cell. Now put the formula down and then let's pull it across.
Next, we'll find the commission. The commission is how many test pixels were incorrectly classified as a class. So it is the incorrectly classified pixels in the row divided by the total number of pixels in the row.
There's a little more-- this is a little more time consuming to fill in the cells because you cannot just put them down the way you could for the percent. We can think of this as the rate at which the class has been overclassified. I'm going to skip ahead here.
Omission is the opposite. It is the incorrectly classified pixels in the column divided by the total number of the column. Let's take a moment to fill these in.
I'm going to put one in for water because I had added that one back in earlier. But I know it was not classified. In mine, I have 0% omission of pasture and forest, but I have 100% omission of urban. This makes sense because there were no pixels classified as urban in my classified image.
Next, we will add the producer's accuracy. These are the correctly classified cells for pasture, forest, water, and urban divided by the reference point total for the class, which are all 40. And then we will find the percentage for each land cover. User's accuracy is like the producer's accuracy in that it is the correctly classified cells for pasture, forest, water, and urban, but this time it is divided by the total points appearing in a given class. So 57, 63, 39, and 0.
Now let's go back to the very top and find the overall accuracy and the kappa coefficient. The overall accuracy is the sum of the correctly classified cells divided by the total number of cells. This one is pretty easy to find.
The kappa coefficient is a little less straightforward, so let's take a moment to talk about it. The kappa coefficient may be used as a measure of agreement between the model predictions, so that would be our classified image, and reality, or to determine if the values contained in an error matrix represent a result significantly better than random. A one would indicate perfect agreement between reality and our classified image. And a 0 is representative of complete randomness.
Here's the kappa coefficient equation. N is the total number of sites in the matrix. r is the number of rows in the matrix. Xii is the number in row i and column i. Then x plus i is the total for row i. And Xi plus is the total for column l.
So what is this actually asking us to do? We need to multiply the total number of reference points by the sum of the correctly classified pixels. From this number, we will subtract the sum of all the class row total by the class column total. Next, we will divide all of this by the square of the total number of reference points and subtracting the sum of the class row total by the class column total.
This is more sophisticated than the overall accuracy as it takes the misclassifications into account, as well as the correctly classified and the ideal classification. The harder part is now to put this all into Excel, but I'm going to leave that to you. Your kappa results should be similar to those of the overall accuracy.
This concludes our tutorial on accuracy assessments in Arc Map. Thanks for listening. And for more information or resources from Texas A&M University Map and GIS library, please visit our website at library.tamu.edu/maps-gis.