GEOG 485:
GIS Programming and Automation

3.2.2 Reading through records

PrintPrint

Now that you know how to traverse the table horizontally, reading the fields that are available, let's examine how to read up and down through the table records.

The search cursor

The arcpy module contains some objects called cursors that allow you to move through records in a table. Cursors are not unique to ArcGIS scripting; in fact, if you've worked in ArcObjects before, this concept of a cursor is probably familiar to you.

There have been quite a few changes made to how cursors can be used over the different versions of ArcGIS. Since older versions are still being widely used, we first illustrate the usage of cursors in a way that works for old and new versions of ArcGIS. We then describe changes introduced in versions 10.0 and 10.1 that make using cursors easier, more robust, and require fewer code. In the examples in the rest of the course materials, we will then always start with a version for 10.1 and higher but also show a solution that works for version 10.0. If you exchange and download scripts in connection with GIS professional work, you'll probably run across cursor code that is structured in all these different ways.

The first code we'll look at is the search cursor, since it's designed for simple reading of data. We'll start with the traditional use of the search cursor used primarily in past versions of ArcGIS (and still working today for people who are re-using old code). Although you'll come to learn that this code is more verbose than what you use in more recent versions of ArcGIS, it gives you a more fundamental understanding of what the search cursor is doing "beneath the hood". The common workflow is:

  1. Create the search cursor. This is done through the method arcpy.SearchCursor(). This method takes several parameters in which you specify which dataset and, optionally, which specific rows you want to read.
  2. Call SearchCursor.next() to read the first row.
  3. Start a loop that will exit when there are no more rows available to read.
  4. Do something with the values in the current row.
  5. Call SearchCursor.next() to move on to the next row. Because you created a loop, this puts you back at the previous step if there is another row available to be read. If there are no more rows, the loop condition is not met and the loop terminates.

When you first try to understand cursors, it may help to visualize the attribute table with an arrow pointing at the "current row." When the cursor is first created, that arrow is pointing just above the first row in the table. The first time the next() method is called, the arrow moves down to the first row (and returns a reference to that row). Each time next() is called, the arrow moves down one row. If next() is called when the arrow is pointing at the last row, a special data type called None is returned.

Here's a very simple example of a search cursor that reads through a point dataset of cities and prints the name of each.

# Prints the name of each city in a feature class

import arcpy

featureClass = "C:\\Data\\Alabama\\Alabama.gdb\\Cities"

rows = arcpy.SearchCursor(featureClass)
row = rows.next()

while row:
    print row.NAME
    row = rows.next()

The last five lines of the above script correspond to the five steps in the above workflow. Cursors can be tricky to understand at first, so let's look at those lines more closely. Below are the five lines again with comments so you can see exactly what's happening:

# Create the search cursor
rows = arcpy.SearchCursor(featureClass)

# Call SearchCursor.next() to read the first row
row = rows.next()

# Start a loop that will exit when there are no more rows available
while row:

    # Do something with the values in the current row     
    print row.NAME

    # Call SearchCursor.next() to move on to the next row    
    row = rows.next()

Notice a few other important things before moving on:

  • The loop condition "while row:" is a simple Boolean way of specifying whether the loop should continue. If a row object exists, the statement evaluates to true and the loop continues. If a row object doesn't exist, the statement evaluates to false and the loop terminates.
  • You can read a field value as a property of a row. For example, row.NAME gave you the value in the NAME field. If your table had a POPULATION field, you could use row.POPULATION to get the population.
  • The names "rows" and "row" are just variable names that represent the SearchCursor and Row objects, respectively. We could name these anything. The Esri examples tend to name them rows and row, and we'll do the same. However, if you needed to use two search cursors at the same time, you'd have to come up with some additional names.

Here's another example where something more complex is done with the row values. This script finds the average population for counties in a dataset. To find the average, you need to divide the total population by the number of counties. The code below loops through each record and keeps a running total of the population and the number of records counted. Once all the records have been read, only one line of division is necessary to find the average. You can get the sample data for this script here.

# Finds the average population in a counties dataset

import arcpy

featureClass = "C:\\Data\\Pennsylvania\\Counties.shp"

rows = arcpy.SearchCursor(featureClass)
row = rows.next()

average = 0
totalPopulation = 0
recordsCounted = 0

# Loop through each row and keep running total of population
#  and records counted.

while row:
    totalPopulation += row.POP1990
    recordsCounted += 1
    row = rows.next()

average = totalPopulation / recordsCounted
print "Average population for a county is " + str(average)

Although the above script is longer than the first one, it's still following the general pattern of creating a search cursor, advancing to the first row, doing something with the row, and repeating the process until there are no records left.

Reading values when the field name is a variable

In the previous script, the population of a record was referenced as row.POP1990 where the population field name is POP1990. This is a pretty easy way to get a field value, but what happens if you get data for 2009 in a field named POP2009 and you want to run the script again? What if you have many, long scripts that always reference the population field this way? You would have to carefully search each script for row.POP1990 and replace it with row.POP2009. This could be tedious and error-prone.

You can make your scripts more versatile by using variables to represent field names. You could declare a variable, such as populationField to reference the population field name, whether it were POP1990, POP2009, or simply POPULATION. The Python interpreter isn't going to recognize row.populationField, so you need to use Row.getValue() instead and pass in the variable as a parameter.

The script below uses a variable name to get the population for each record. Lines changed from the script above have a comment above them "### This row below is new". Notice how a variable named populationField is created and the method call row.getValue(populationField) that retrieves the population of each record.

# Finds the average population in a counties dataset

import arcpy

featureClass = "C:\\Data\\Pennsylvania\\Counties.shp"
### This row below is new
populationField = "POP1990"

rows = arcpy.SearchCursor(featureClass)
row = rows.next()

average = 0
totalPopulation = 0
recordsCounted = 0

# Loop through each row and keep running total of population
#  and records counted.

while row:
### This row below is new
    totalPopulation += row.getValue(populationField)
    recordsCounted += 1
    row = rows.next()

average = totalPopulation / recordsCounted
print "Average population for a county is " + str(average) 

To update the above script, you would just have to set populationField = "POP2009" near the top of the script. This is certainly easier than searching through the body of the script for row.POP1990; however, you can go one step further and allow the user to enter any field name that he or she wants as an argument when running the script.

Remember in Lesson 1 how you learned that arcpy.GetParameterAsText() allows the user of the script to supply a value for the variable? Using that technique for both the feature class path and the population field name makes your script very flexible. Notice that the code below contains no hard-coded path names, field names, or numbers besides 0 and 1. This means you could run the script with any feature class containing any name for its population field without modifying the code. In fact, you could use code similar to this to find the average of any numeric field, such as square mileage, or number of homeowners.

# Finds the average population in a counties dataset

import arcpy

featureClass = arcpy.GetParameterAsText(0)
populationField = arcpy.GetParameterAsText(1)

rows = arcpy.SearchCursor(featureClass)
row = rows.next()

average = 0
totalPopulation = 0
recordsCounted = 0

# Loop through each row and keep running total of population
#  and records counted.

while row:
    totalPopulation += row.getValue(populationField)
    recordsCounted += 1
    row = rows.next()

average = totalPopulation / recordsCounted
print "Average population for a county is " + str(average)  

Here's how you could run the above script in PythonWin by supplying the path name (if we had data for Iowa) and population field (if it were POP2008) as the arguments.

 Screen capture to show the Run Script dialog box with arguments
Figure 3.3 Running the above script in PythonWin.

Using a for loop with a cursor (introduced at ArcGIS 10.0)

Although the above examples use a while loop in conjunction with the next() method to advance the cursor, it's often easier to iterate through each record using a for loop. This became possible with ArcGIS 10.0. Here's how the above sample could be modified to use a for loop. Notice the syntax for row in rows.

# Finds the average population in a counties dataset

import arcpy

featureClass = "C:\\Data\\Pennsylvania\\Counties.shp"
populationField = "POP1990"

rows = arcpy.SearchCursor(featureClass)

average = 0
totalPopulation = 0
recordsCounted = 0

# Loop through each row and keep running total of population
#  and records counted.

for row in rows:
    totalPopulation += row.getValue(populationField)
    recordsCounted += 1

average = totalPopulation / recordsCounted
print "Average population for a county is " + str(average) 

In this example, the next() method is not even required because it is implied by the for loop that the script will iterate through every record. The object named row is declared when the for loop is declared.

While this syntax is more compact than using a while loop, there is some benefit to seeing how the next() method works, especially if you ever work with ArcGIS 9.3.x Python scripts or if you use cursors in ArcObjects (which has conceptually similar methods for advancing a cursor row by row). However, once you get accustomed to using a for loop to traverse a table, it's unlikely you'll want to go back to using while loops.

The arcpy data access module (introduced at ArcGIS 10.1)

If you're using ArcGIS 10.1 or higher, you can use the above code for search cursors, or you can use a new data access module that was introduced into arcpy. These data access functions are prefixed with arcpy.da and give you faster performance along with more robust behavior when crashes or errors are encountered with the cursor.

The data access module arcpy.da allows you to create cursor objects, just like arcpy, but you create them a little differently. Take a close look at the following example code, which repeats the scenario above to calculate the average population of a county.

# Finds the average population in a counties dataset

import arcpy

featureClass = "C:\\Data\\Pennsylvania\\Counties.shp"
populationField = "POP1990"

average = 0
totalPopulation = 0
recordsCounted = 0

with arcpy.da.SearchCursor(featureClass, (populationField,)) as cursor:
    for row in cursor:
        totalPopulation += row[0]
        recordsCounted += 1

average = totalPopulation / recordsCounted
print "Average population for a county is " + str(average)

This example uses the same basic structure as the previous one, with a few important changes. One thing you probably noticed is that the cursor is created using a "with" statement. Although the explanation of "with" is somewhat technical, the key thing to understand is that it allows your cursor to exit the dataset gracefully, whether it crashes or completes its work successfully. This is a big issue with cursors, which can sometimes maintain locks on data if they are not exited properly.

The "with" statement requires that you indent all the code beneath it. After you create the cursor in your "with" statement, you'll initiate a for loop to run through all the rows in the table. This requires additional indentation.

Notice that this "with" statement creates a SearchCursor object, and declares that it will be named "cursor" in any subsequent code. The search cursors you create with arcpy.da have some different initialization parameters from the search cursors you create with arcpy. The biggest difference is that when you create a cursor with arcpy.da, you have to supply a tuple of field names that will be returned by the cursor. Remember that a tuple is a Python data structure much like a list, except it is enclosed in parentheses and its contents cannot be modified.

Supplying this tuple speeds up the work of the cursor because it does not have to deal with the potentially dozens of fields included in your dataset. In the example above, the tuple contains just one field, populationField. A tuple with just one item in it contains a comma after the item, therefore our tuple above looks like this: (populationField,). If the tuple were to have multiple items in it, we might see something like: (populationField, nameField).

Notice that with arcpy.da, you use row objects like with arcpy; however, you do not use the getValue method to retrieve values out of the rows. Instead, you use the index position of the field name in the tuple you submitted when you created the object. Since the above example submits only one item in the tuple, then the index position of populationField within that tuple is 0 (remember that we start counting from 0 in Python). Therefore, you can use row[0] to get the value of populationField for a particular row.

Since most students these days use ArcGIS versions 10.1 or higher, the examples from this point forward use the arcpy data access module. It's worth the effort to focus on arcpy.da functions because it will make your code faster, more compact, and more robust. You are required to use arcpy.da in your Lesson 3 project code unless you are running on a version prior to ArcGIS 10.0 and you have previously cleared this arrangment with the instructor.