GEOG 485:
GIS Programming and Automation

3.2.3 Retrieving records using an attribute query


The previous examples used the SearchCursor object to read through each record in a dataset. You can get more specific with the search cursor by instructing it to retrieve just the subset of records whose attributes comply with some criteria, for example, "only records with a population greater than 10000" or "all records beginning with the letters P – Z."

For review, this is how you construct a search cursor to operate on every record in a datasets using the arcpy.da module (version 10.1 or higher):

with arcpy.da.SearchCursor(featureClass,(populationField,)) as cursor:

If you want the search cursor to retrieve only a subset of the records based on some criteria, you can supply a SQL expression as the third argument in the constructor (the constructor is the method that creates the SearchCursor). For example:

with arcpy.da.SearchCursor(featureClass, (populationField,), '"POP2008" > 100000') as cursor:

The above example uses the SQL expression "POP2008" > 100000 to retrieve only the records whose population is greater than 100000. SQL stands for "Structured Query Language" and is a special syntax used for querying datasets. If you've ever used a Definition Query to filter data in ArcMap, then you've had some exposure to these sorts of SQL queries. If SQL is new to you, please take a few minutes right now to read Building a query expression in the ArcGIS Desktop Help. This topic is a simple introduction to SQL in the context of ArcGIS.

SQL expressions can contain a combination of criteria allowing you to pinpoint a very focused subset of records. The complexity of your query is limited only by your available data. For example, you could use a SQL expression to find only states with a population density over 100 people per square mile that begin with the letter M and were settled after 1850.

Note that the SQL expression you supply for a search cursor is for attribute queries, not spatial queries. You could not use a SQL expression to select records that fall "west of the Mississippi River," or "inside the boundary of Canada" unless you had previously added and populated some attribute stating whether that condition were true (for example, REGION = 'Western' or CANADIAN = True). Later in this lesson we'll talk about how to make spatial queries using the Select By Location geoprocessing tool.

Once you retrieve the subset of records, you can follow the same pattern of iterating through them using a for loop.

with arcpy.da.SearchCursor(featureClass, (populationField,), '"POP2008" > 100000') as cursor:
    for row in cursor:
        print str(row[0])

Handling quotation marks

When you include a SQL expression in your SearchCursor constructor, you must supply it as a string. This is where things can get tricky with quotation marks. SQL requires single and double quotes in specific places, but you also need to enclose the entire expression with quotes because it is a string. How do you keep from getting confused?

In Python you can use either single quotes or double quotes to enclose a string. You may have noticed that in the above example, I enclosed the SQL expression in single quotes, not double quotes: '"POP2008" > 100000'.  Because I knew the double quotes were going to be required in the SQL statement (to surround the field name), I used single quotes to surround the entire string. This is not just to keep things easy to read; if I had used two double quotes in a row the Python interpreter would get confused and think I was creating an empty string. Therefore, it was not an option to use ""POP2008" > 100000".

The situation gets more difficult when your SQL expression must use both single and double quotes, for instance, when you query for a string variable. Suppose your script allows the user to enter the ID of a parcel and you need to find it with a search cursor. Some of the parcel IDs include letters and others don't, therefore you need to always treat the parcel ID as a string. Your SQL expression would probably look like this: "PARCEL" = 'A2003KSW'. This expression starts with double quotes and ends with single quotes, so which style of quotes do you use to enclose the entire expression?

In this case, you cannot simply enclose the entire expression in one style of quotes; you need to break up the expression into separate strings. Take a close look at this example:

ID = arcpy.GetParameterAsText(0)
whereClause = '"Parcel"' + " = '" + str(ID) + "'"
with arcpy.da.SearchCursor(featureClass, (parcelField,), whereClause) as cursor:

In the code above, the whereClause, or the SQL expression, is created in manageable chunks that don't mix single and double quotes. If the piece of the expression contains double quotes, such as "Parcel", it is enclosed in single quotes. If the piece of the expression contains single quotes, such as =' or just ', it is enclosed in double quotes. This type of situation is where it may be helpful to temporarily include a print statement in your code or use the debugging tools to make sure your whereClause is constructed correctly. It can be helpful if you can train your eye to focus on the + sign as a separator between all the independent pieces of the string you are constructing.

Field delimiters

In the examples above, field names are surrounded with double quotes (for example, "STATE_NAME"). This is correct syntax for shapefiles and file geodatabases, which are the only data types we'll use in this course. If you use personal geodatabases in your daily work, there are different ways to delimit the field name. If you're interested in the correct syntax for different data types, or ways to make your script flexible for any data type, take a look at the topic SQL reference for query expressions used in ArcGIS in the ArcGIS Desktop Help.