GEOG 485:
GIS Programming and Software Development

3.2.3 Retrieving records using an attribute query


(Note: In this section, we are only using short code sections to read through, rather than full code examples that can be run directly. However, as an exercise, you can adapt the code to, for instance, work with the Pennsylvania data set from the previous section by using Counties.shp and the POP1990 field.)

The previous examples used the SearchCursor object to read through each record in a dataset. You can get more specific with the search cursor by instructing it to retrieve just the subset of records whose attributes comply with some criteria, for example, "only records with a population greater than 10000" or "all records beginning with the letters P – Z."

For review, this is how you construct a search cursor to operate on every record in a dataset using the arcpy.da module:

with arcpy.da.SearchCursor(featureClass,(populationField)) as cursor:

If you want the search cursor to retrieve only a subset of the records based on some criteria, you can supply a SQL expression (a where clause) as the third argument in the constructor (the constructor is the method that creates the SearchCursor). For example:

with arcpy.da.SearchCursor(featureClass, (populationField), "POP2018 > 100000") as cursor:

The above example uses the SQL expression POP2018 > 100000 to retrieve only the records whose population is greater than 100000. SQL stands for "Structured Query Language" and is a special syntax used for querying datasets. If you've ever used a Definition Query to filter a layer's data in Pro, then you've had some exposure to these sorts of SQL queries. If SQL is new to you, please take a few minutes right now to read Write a query in the Query Builder in the ArcGIS Pro Help. This topic is a simple introduction to SQL in the context of ArcGIS.

SQL expressions can contain a combination of criteria, allowing you to pinpoint a very focused subset of records. The complexity of your query is limited only by your available data. For example, you could use a SQL expression to find only states with a population density over 100 people per square mile that begin with the letter M and were settled after 1850.

Note that the SQL expression you supply for a search cursor is for attribute queries, not spatial queries. You could not use a SQL expression to select records that fall "west of the Mississippi River," or "inside the boundary of Canada" unless you had previously added and populated some attribute stating whether that condition were true (for example, REGION = 'Western' or CANADIAN = True). Later in this lesson, we'll talk about how to make spatial queries using the Select By Location geoprocessing tool.

Once you retrieve the subset of records, you can follow the same pattern of iterating through them using a for loop.

with arcpy.da.SearchCursor(featureClass, (populationField), "POP2018 > 100000") as cursor:
    for row in cursor:
        print (str(row[0]))

Handling quotation marks

When you include a SQL expression in your SearchCursor constructor, you must supply it as a string. This is where things can get tricky with quotation marks since parts of the expression may also need to be quoted (specifically string values, such as a state abbreviation). The rule for writing queries in ArcGIS Pro is that string values must be enclosed in single quotes.  Given that, you should enclose the overall expression in double quotes.   

For example, suppose your script allows the user to enter the ID of a parcel, and you need to find it with a search cursor.  Your SQL expression might look like this: " PARCEL = 'A2003KSW' ".

Handling quotation marks is simplified greatly in Pro as compared to ArcMap.  In ArcMap, certain data formats require field names to be enclosed in double quotes, which when combined with string values being present in the expression, can make constructing the expression correctly quite a headache.  If you find yourself needing to write ArcMap scripts that query data, check out the parallel page in the ArcMap version of this lesson (see course navigation links to the right).

Selecting records by attribute

As an ArcGIS Pro user, you've probably clicked the Select By Attributes button, located under the Map tab, to perform attribute queries.  What we've been talking about in this part of the lesson probably reminded you of doing that sort of query, but it's important to note that opening a search cursor with a SQL expression (or where clause) as described above is not quite the same sort of operation.  A search cursor is used in situations where you want to do some sort of processing of the records one by one.  For example, as we saw, printing the names of cities or calculating their average population.  

Some situations instead call for creating a selection on the feature class (i.e., treating the features identified by the query as a single unit).  This can be done in Python scripts by invoking the Select Layer By Attribute tool (which is actually the tool that opens when you click the Select By Attributes button in Pro).  The output from this tool -- referred to as a Feature Layer -- can then be used as the input to many other tools (Calculate Field, Copy Features, Delete Features, to name a few).  This second tool's processing will be limited to the selection held in the input.

For an example, let's pick up on the population query used above.  Let's say we wanted to do an analysis in which high-population cities were excluded.  We might select those cities and then delete them:

popQuery = 'POP2018 > 100000'
bigCities = arcpy.SelectLayerByAttribute_management('Cities', 'NEW_SELECTION', popQuery)

In this snippet of code, note that the same where clause is implemented, this time to create a selection on the Cities feature class as opposed to producing a cursor of records to iterate through.  The SelectLayerByAttribute tool returns a Feature Layer, which is an in-memory object that references the selected features.  The object is stored in a variable, bigCities, which is then used as input to the DeleteFeatures tool.  This will delete the high-population cities from the underlying Cities feature class.

Now, let's have a look at how to handle queries with spatial constraints.