Lesson 3: GIS data access and manipulation with Python

Lesson 3 Overview

An essential part of a GIS is the data that represents both the geometry (locations) of geographic features and the attributes of those features. This combination of features and attributes is what makes GIS go beyond just "mapping." Much of your work as a GIS analyst involves adding, modifying, and deleting features and their attributes from the GIS.

Beyond maintaining the data, you also need to know how to query and select the data that is most important to your projects. Sometimes you'll want to query a dataset to find only the records that match a certain criteria (for example, single-family homes constructed before 1980) and calculate some statistics based on only the selected records (for example, percentage of those homes that experienced termite infestation).

All of the above tasks of maintaining, querying, and summarizing data can become tedious and error prone if performed manually. Python scripting is often a faster and more accurate way to read and write large amounts of data. There are already many tools for data selection and management in ArcToolbox. Any of these can be used in a Python script. For more customized scenarios where you want to read through a table yourself and modify records one-by-one, arcpy contains special objects, called cursors, that you can use to examine each record in a table. You'll quickly see how the looping logic that you learned in Lesson 2 becomes useful when you are cycling through tables using cursors.

Using a script to work with your data introduces some other subtle advantages over manual data entry. For example, in a script, you can add checks to ensure that the data entered conforms to a certain format. You can also chain together multiple steps of selection logic that would be time-consuming to perform in ArcMap.

This lesson explains ways to read and write GIS data using Python. We'll start off by looking at how you can create and open datasets within a script. Then, we'll practice reading and writing data using both geoprocessing tools and cursor objects. Although this is most applicable to vector datasets, we'll also look at some ways you can manipulate rasters with Python. Once you're familiar with these concepts, Project 3 will give you a chance to practice what you've learned.

Lesson 3 checklist

Lesson 3 explains how to read and manipulate both vector and raster data with Python. To complete Lesson 3, you are required to do the following:

  1. Work through the course lesson materials.
  2. Read Zandbergen chapter 7.1 - 7.3 and all of chapter 9. In the online lesson pages, I have inserted instructions about when it is most appropriate to read each of these chapters.
  3. Complete Project 3, and submit your zipped deliverables to the Project 3 drop box.
  4. Complete the Lesson 3 Quiz.
  5. Read the Final Project proposal assignment, and begin working on your proposal, which is due after the first week of Lesson 4.


Do items 1 - 2 (including any of the practice exercises you want to attempt) during the first week of the lesson. You will need the second week of the lesson to concentrate on the project, the quiz, and the proposal assignment.

Lesson objectives

By the end of this lesson you should:

  • understand the use of different types of workspaces (e.g., geodatabases) in Python;
  • be able to use arcpy cursors to read and manipulate vector attribute data (search, update, and delete records);
  • know how to construct SQL query strings to realize selection by attribute with cursors or MakeFeatureLayer;
  • understand the concept of layers, how layers can be created, and how to realize selection by location with layers;
  • be able to write ArcGIS scripts that solve common tasks with vector data by combining selection by attribute, selection by location, and manipulation of attribute tables via cursors.

3.1 Data storage and retrieval in ArcGIS

Before getting into the details of how to read and modify these attributes, it's helpful to review how geographic datasets are stored in ArcGIS. You need to know this so you can open datasets in your scripts, and on occasion, create new datasets.

Geodatabases

Over the years, Esri has developed various ways of storing spatial data. They encourage you to put your data in geodatabases, which are organizational structures for storing datasets and defining relationships between those datasets. Different flavors of geodatabase are offered for storing different magnitudes of data.

  • Personal geodatabases are a small, nearly deprecated form of geodatabase that store data on the local file system. The data is held in a Microsoft Access database, which limits how much data can be stored in the geodatabase.
  • File geodatabases are a newer way of storing data on the local file system. The data is stored in a proprietary format developed by Esri. A file geodatabase can hold more data than a personal geodatabase: up to terabytes.
  • ArcSDE geodatabases or "enterprise geodatabases" store data on a central server in a relational database management system (RDBMS) such as SQL Server, Oracle, or PostgreSQL. These are large databases designed for serving data not just to one computer, but to an entire enterprise. Since working with an RDBMS can be a job in itself, Esri has develped ArcSDE as "middleware" that allows you to configure and read your datasets in ArcCatalog or ArcMap without touching the RDBMS software.

    For actions where ArcSDE is required but where it would be too heavy-handed to purchase and configure an enterprise RDBMS, Esri has developed a smaller "workgroup" version of ArcSDE that works with the free database SQL Server Express. This can be configured directly from ArcCatalog or the Catalog window in ArcMap.

    In recent years, Esri has also promoted a new feature called query layers, which allow you to pull data directly out of an RDBMS using SQL queries, with no ArcSDE involved.

A single vector dataset within a geodatabase is called a feature class. Feature classes can be optionally organized in feature datasets. Raster datasets can also be stored in geodatabases.

Standalone datasets

Although geodatabases are essential for long-term data storage and organization, it's sometimes convenient to access datasets in a "standalone" format on the local file system. Esri's shapefile is probably the most ubiquitous standalone vector data format (it even has its own Wikipedia article). A shapefile actually consists of several files that work together to store vector geometries and attributes. The files all have the same root name, but use different extensions. You can zip the participating files together and easily email them or post them in a folder for download. In the Esri file browsers in ArcCatalog or ArcMap, the shapefiles just appear as one file.

Note:

Sometimes in ESRI documentation, shapefiles are also referred to as "feature classes." When you see the term "feature class," consider it to mean a vector dataset that can be used in ArcGIS.

Another type of standalone dataset dating back to the early days of ArcGIS is the ArcInfo coverage. Like the shapefile, the coverage consists of several files that work together. Coverages are becoming more and more rare, but you might encounter them if your organization has used (or still uses!) ArcInfo Workstation.

Raster datasets are also often stored in standalone format instead of being loaded into a geodatabase. A raster dataset can be a single file, such as a JPEG or a TIFF, or, like a shapefile, it can consist of multiple files that work together.

Providing paths in Python scripts

Often in a script, you'll need to provide the path to a dataset. Knowing the syntax for specifying the path is sometimes a challenge because of the many different ways of storing data listed above. For example, below is an example of what a file geodatabase looks like if you just browse the file system of Windows Explorer. How do you specify the path to the dataset you need? This same challenge could occur with a shapefile, which, although more intuitively named, actually has three or more participating files.

Screen capture showing the windows path to a file geodatabase. Folder USA.gdb
Figure 3.1 A file database as viewed via the file system of Windows Explorer.

The safest way to get the paths you need is to browse to the dataset in ArcCatalog and take the path that appears in the Location toolbar. Here's what the same file geodatabase would look like in ArcCatalog. The circled path shows how you would refer to a feature class within the geodatabase.

 Screen capture to show viewing a path in ArcCatalog. Location: C:\Datat\USA.gbd\Citites
Figure 3.2 The same file geodatabase, shown in ArcCatalog.

Below is an example of how you could access the feature class in a Python script using this path. This is similar to one of the examples in Lesson 1.

import arcpy
featureClass = "C:\\Data\\USA\\USA.gdb\\Cities"
desc = arcpy.Describe(featureClass)
spatialRef = desc.SpatialReference
print (spatialRef.Name)

Remember that the backslash (\) is a reserved character in Python, so you'll need to use either the double backslash (\\) or forward slash (/) in the path. Another technique you can use for paths is the raw string, which allows you to put backslashes and other reserved characters in your string as long as you put "r" before your quotation marks.

featureClass = r"C:\Data\USA\USA.gdb\Cities"
. . .

Workspaces

The Esri geoprocessing framework often uses the notion of a workspace to denote the folder or geodatabase where you're currently working. When you specify a workspace in your script, you don't have to list the full path to every dataset. When you run a tool, the geoprocessor sees the feature class name and assumes that it resides in the workspace you specified.

Workspaces are especially useful for batch processing, when you perform the same action on many datasets in the workspace. For example, you may want to clip all the feature classes in a folder to the boundary of your county. The workflow for this is:

  1. Define a workspace.
  2. Create a list of feature classes in the workspace.
  3. Define a clip feature.
  4. Configure a loop to run on each feature class in the list.
  5. Inside the loop, run the Clip tool.

Here's some code that clips each feature class in a file geodatabase to the Alabama state boundary, then places the output in a different file geodatabase. Note how the five lines of code after import arcpy correspond to the five steps listed above.

import arcpy

arcpy.env.workspace = "C:\\Data\\USA\\USA.gdb"
featureClassList = arcpy.ListFeatureClasses()
clipFeature = "C:\\Data\\Alabama\\Alabama.gdb\\StateBoundary"

for featureClass in featureClassList:
    arcpy.Clip_analysis(featureClass, clipFeature, "C:\\Data\\Alabama\\Alabama.gdb\\" + featureClass)

In the above example, the method arcpy.ListFeatureClasses() was the key to making the list. This method looks through a workspace and makes a Python list of each feature class in that workspace. Once you have this list, you can easily configure a for loop to act on each item.

Notice that you designated the path to the workspace using the location of the file geodatabase "C:\\Data\\USA\\USA.gdb". If you were working with shapefiles, you would just use the path to the containing folder as the workspace. You can download the USA.gdb here and the Alabama.gdb here.

If you were working with ArcSDE, you would use the path to the .sde connection file when creating your workspace. This is a file that is created when you connect to ArcSDE in ArcCatalog, and is placed in your local profile directory. We won't be accessing ArcSDE data in this course, but if you do this at work, remember that you can use the Location toolbar in ArcCatalog to help you understand the paths to datasets in ArcSDE.

3.2 Reading vector attribute data

Now that you know how to open a dataset, let's go little bit deeper and start examining some individual data records. This section of the lesson discusses how to read and search data tables. These tables often provide the attributes for vector features, but they can also stand alone in some cases. The next session will cover how to write data to tables. At the end of the lesson, we'll look at rasters.

As we work with the data, it will be helpful for you to follow along, copying and pasting the example code into practice scripts. Throughout the lesson, you'll encounter exercises that you can do to practice what you just learned. You're not required to turn in these exercises; but if you complete them, you will have a greater familiarity with the code that will be helpful when you begin working on this week's project. It's impossible to read a book or a lesson, then sit down and write perfect code. Much of what you learn comes through trial and error and learning from mistakes. Thus, it's wise to write code often as you complete the lesson.

3.2.1 Accessing data fields

Before we get too deep into vector data access, it's going to be helpful to quickly review how the vector data is stored in the software. Vector features in ArcGIS feature classes (using this term to include shapefiles) are stored in a table. The table has rows (records) and columns (fields).

Fields in the table

Fields in the table store the geometry and attribute information for the features.

There are two fields in the table that you cannot delete. One of the fields (usually called Shape) contains the geometry information for the features. This includes the coordinates of each vertex in the feature and allows the feature to be drawn on the screen. The geometry is stored in binary format; if you were to see it printed on the screen, it wouldn't make any sense to you. However, you can read and work with geometries using objects that are provided with arcpy.

The other field included in every feature class is an object ID field (OBJECTID or FID). This contains a unique number, or identifier for each record that is used by ArcGIS to keep track of features. The object ID helps avoid confusion when working with data. Sometimes records have the same attributes. For example, both Los Angeles and San Francisco could have a STATE attribute of 'California,' or a USA cities dataset could contain multiple cities with the NAME attribute of 'Portland;' however, the OBJECTID field can never have the same value for two records.

The rest of the fields contain attribute information that describe the feature. These attributes are usually stored as numbers or text.

Discovering field names

When you write a script, you'll need to provide the names of the particular fields you want to read and write. You can get a Python list of field names using arcpy.ListFields().

# Reads the fields in a feature class

import arcpy

featureClass = "C:\\Data\\USA\\USA.gdb\\Cities"
fieldList = arcpy.ListFields(featureClass)
# Loop through each field in the list and print the name
for field in fieldList:
    print (field.name)

The above would yield a list of the fields in the Cities feature class in a file geodatabase named USA.gdb. If you ran this script in PythonWin (try it with one of your own feature classes!) you would see something like the following in the Interactive Window.

>>> OBJECTID
Shape
UIDENT
POPCLASS
NAME
CAPITAL
STATEABB
COUNTRY

Notice the two special fields we already talked about: OBJECTID, which holds the unique identifying number for each record, and Shape, which holds the geometry for the record. Additionally, this feature class has fields that hold the name (NAME), the state (STATEABB), whether or not the city is a capital (CAPITAL), and so on.

arcpy treats the field as an object, therefore the field has properties that describe it. That's why you can print field.name. The help reference topic Using fields and indexes lists all the properties that you can read from a field. These include AliasName, Length, Type, Scale, Precision, and others.

Properties of a field are read-only, meaning that you can find out what the field properties are, but you cannot change those properties in a script using the Field object. If you wanted to change the scale and precision of a field, for instance, you would have to programmatically add a new field.

3.2.2 Reading through records

Now that you know how to traverse the table horizontally, reading the fields that are available, let's examine how to read up and down through the table records.

The search cursor

The arcpy module contains some objects called cursors that allow you to move through records in a table. Cursors are not unique to ArcGIS scripting; in fact, if you've worked in ArcObjects before, this concept of a cursor is probably familiar to you.

There have been quite a few changes made to how cursors can be used over the different versions of ArcGIS. Since older versions are still being widely used, we first illustrate the usage of cursors in a way that works for old and new versions of ArcGIS. We then describe changes introduced in versions 10.0 and 10.1 that make using cursors easier, more robust, and require fewer code. In the examples in the rest of the course materials, we will then always start with a version for 10.1 and higher but also show a solution that works for version 10.0. If you exchange and download scripts in connection with GIS professional work, you'll probably run across cursor code that is structured in all these different ways.

The first code we'll look at is the search cursor, since it's designed for simple reading of data. We'll start with the traditional use of the search cursor used primarily in past versions of ArcGIS (and still working today for people who are re-using old code). Although you'll come to learn that this code is more verbose than what you use in more recent versions of ArcGIS, it gives you a more fundamental understanding of what the search cursor is doing "beneath the hood". The common workflow is:

  1. Create the search cursor. This is done through the method arcpy.SearchCursor(). This method takes several parameters in which you specify which dataset and, optionally, which specific rows you want to read.
  2. Call SearchCursor.next() to read the first row.
  3. Start a loop that will exit when there are no more rows available to read.
  4. Do something with the values in the current row.
  5. Call SearchCursor.next() to move on to the next row. Because you created a loop, this puts you back at the previous step if there is another row available to be read. If there are no more rows, the loop condition is not met and the loop terminates.

When you first try to understand cursors, it may help to visualize the attribute table with an arrow pointing at the "current row." When the cursor is first created, that arrow is pointing just above the first row in the table. The first time the next() method is called, the arrow moves down to the first row (and returns a reference to that row). Each time next() is called, the arrow moves down one row. If next() is called when the arrow is pointing at the last row, a special data type called None is returned.

Here's a very simple example of a search cursor that reads through a point dataset of cities and prints the name of each.

# Prints the name of each city in a feature class

import arcpy

featureClass = "C:\\Data\\USA\\USA.gdb\\Cities"

rows = arcpy.SearchCursor(featureClass)
row = rows.next()

while row:
    print (row.NAME)
    row = rows.next()

The last five lines of the above script correspond to the five steps in the above workflow. Cursors can be tricky to understand at first, so let's look at those lines more closely. Below are the five lines again with comments so you can see exactly what's happening:

# Create the search cursor
rows = arcpy.SearchCursor(featureClass)

# Call SearchCursor.next() to read the first row
row = rows.next()

# Start a loop that will exit when there are no more rows available
while row:

    # Do something with the values in the current row     
    print (row.NAME)

    # Call SearchCursor.next() to move on to the next row    
    row = rows.next()

Notice a few other important things before moving on:

  • The loop condition "while row:" is a simple Boolean way of specifying whether the loop should continue. If a row object exists, the statement evaluates to true and the loop continues. If a row object doesn't exist, the statement evaluates to false and the loop terminates.
  • You can read a field value as a property of a row. For example, row.NAME gave you the value in the NAME field. If your table had a POPULATION field, you could use row.POPULATION to get the population.
  • The names "rows" and "row" are just variable names that represent the SearchCursor and Row objects, respectively. We could name these anything. The Esri examples tend to name them rows and row, and we'll do the same. However, if you needed to use two search cursors at the same time, you'd have to come up with some additional names.

Here's another example where something more complex is done with the row values. This script finds the average population for counties in a dataset. To find the average, you need to divide the total population by the number of counties. The code below loops through each record and keeps a running total of the population and the number of records counted. Once all the records have been read, only one line of division is necessary to find the average. You can get the sample data for this script here.

# Finds the average population in a counties dataset

import arcpy

featureClass = "C:\\Data\\Pennsylvania\\Counties.shp"

rows = arcpy.SearchCursor(featureClass)
row = rows.next()

average = 0
totalPopulation = 0
recordsCounted = 0

# Loop through each row and keep running total of population
#  and records counted.

while row:
    totalPopulation += row.POP1990
    recordsCounted += 1
    row = rows.next()

average = totalPopulation / recordsCounted
print ("Average population for a county is " + str(average))

Although the above script is longer than the first one, it's still following the general pattern of creating a search cursor, advancing to the first row, doing something with the row, and repeating the process until there are no records left.

Reading values when the field name is a variable

In the previous script, the population of a record was referenced as row.POP1990 where the population field name is POP1990. This is a pretty easy way to get a field value, but what happens if you get data for 2009 in a field named POP2009 and you want to run the script again? What if you have many, long scripts that always reference the population field this way? You would have to carefully search each script for row.POP1990 and replace it with row.POP2009. This could be tedious and error-prone.

You can make your scripts more versatile by using variables to represent field names. You could declare a variable, such as populationField to reference the population field name, whether it were POP1990, POP2009, or simply POPULATION. The Python interpreter isn't going to recognize row.populationField, so you need to use Row.getValue() instead and pass in the variable as a parameter.

The script below uses a variable name to get the population for each record. Lines changed from the script above have a comment above them "### This row below is new". Notice how a variable named populationField is created and the method call row.getValue(populationField) that retrieves the population of each record.

# Finds the average population in a counties dataset

import arcpy

featureClass = "C:\\Data\\Pennsylvania\\Counties.shp"
### This row below is new
populationField = "POP1990"

rows = arcpy.SearchCursor(featureClass)
row = rows.next()

average = 0
totalPopulation = 0
recordsCounted = 0

# Loop through each row and keep running total of population
#  and records counted.

while row:
### This row below is new
    totalPopulation += row.getValue(populationField)
    recordsCounted += 1
    row = rows.next()

average = totalPopulation / recordsCounted
print ("Average population for a county is " + str(average))

To update the above script, you would just have to set populationField = "POP2009" near the top of the script. This is certainly easier than searching through the body of the script for row.POP1990; however, you can go one step further and allow the user to enter any field name that he or she wants as an argument when running the script.

Remember in Lesson 1 how you learned that arcpy.GetParameterAsText() allows the user of the script to supply a value for the variable? Using that technique for both the feature class path and the population field name makes your script very flexible. Notice that the code below contains no hard-coded path names, field names, or numbers besides 0 and 1. This means you could run the script with any feature class containing any name for its population field without modifying the code. In fact, you could use code similar to this to find the average of any numeric field, such as square mileage, or number of homeowners.

# Finds the average population in a counties dataset

import arcpy

featureClass = arcpy.GetParameterAsText(0)
populationField = arcpy.GetParameterAsText(1)

rows = arcpy.SearchCursor(featureClass)
row = rows.next()

average = 0
totalPopulation = 0
recordsCounted = 0

# Loop through each row and keep running total of population
#  and records counted.

while row:
    totalPopulation += row.getValue(populationField)
    recordsCounted += 1
    row = rows.next()

average = totalPopulation / recordsCounted
print ("Average population for a county is " + str(average))

Here's how you could run the above script in PythonWin by supplying the path name (if we had data for Iowa) and population field (if it were POP2008) as the arguments.

 Screen capture to show the Run Script dialog box with arguments
Figure 3.3 Running the above script in PythonWin.

Using a for loop with a cursor (introduced at ArcGIS 10.0)

Although the above examples use a while loop in conjunction with the next() method to advance the cursor, it's often easier to iterate through each record using a for loop. This became possible with ArcGIS 10.0. Here's how the above sample could be modified to use a for loop. Notice the syntax for row in rows.

# Finds the average population in a counties dataset

import arcpy

featureClass = "C:\\Data\\Pennsylvania\\Counties.shp"
populationField = "POP1990"

rows = arcpy.SearchCursor(featureClass)

average = 0
totalPopulation = 0
recordsCounted = 0

# Loop through each row and keep running total of population
#  and records counted.

for row in rows:
    totalPopulation += row.getValue(populationField)
    recordsCounted += 1

average = totalPopulation / recordsCounted
print ("Average population for a county is " + str(average))

In this example, the next() method is not even required because it is implied by the for loop that the script will iterate through every record. The object named row is declared when the for loop is declared.

While this syntax is more compact than using a while loop, there is some benefit to seeing how the next() method works, especially if you ever work with ArcGIS 9.3.x Python scripts or if you use cursors in ArcObjects (which has conceptually similar methods for advancing a cursor row by row). However, once you get accustomed to using a for loop to traverse a table, it's unlikely you'll want to go back to using while loops.

The arcpy data access module (introduced at ArcGIS 10.1)

If you're using ArcGIS 10.1 or higher, you can use the above code for search cursors, or you can use a new data access module that was introduced into arcpy. These data access functions are prefixed with arcpy.da and give you faster performance along with more robust behavior when crashes or errors are encountered with the cursor.

The data access module arcpy.da allows you to create cursor objects, just like arcpy, but you create them a little differently. Take a close look at the following example code, which repeats the scenario above to calculate the average population of a county.

# Finds the average population in a counties dataset

import arcpy

featureClass = "C:\\Data\\Pennsylvania\\Counties.shp"
populationField = "POP1990"

average = 0
totalPopulation = 0
recordsCounted = 0

with arcpy.da.SearchCursor(featureClass, (populationField,)) as cursor:
    for row in cursor:
        totalPopulation += row[0]
        recordsCounted += 1

average = totalPopulation / recordsCounted
print ("Average population for a county is " + str(average))

This example uses the same basic structure as the previous one, with a few important changes. One thing you probably noticed is that the cursor is created using a "with" statement. Although the explanation of "with" is somewhat technical, the key thing to understand is that it allows your cursor to exit the dataset gracefully, whether it crashes or completes its work successfully. This is a big issue with cursors, which can sometimes maintain locks on data if they are not exited properly.

The "with" statement requires that you indent all the code beneath it. After you create the cursor in your "with" statement, you'll initiate a for loop to run through all the rows in the table. This requires additional indentation.

Notice that this "with" statement creates a SearchCursor object, and declares that it will be named "cursor" in any subsequent code. The search cursors you create with arcpy.da have some different initialization parameters from the search cursors you create with arcpy. The biggest difference is that when you create a cursor with arcpy.da, you have to supply a tuple of field names that will be returned by the cursor. Remember that a tuple is a Python data structure much like a list, except it is enclosed in parentheses and its contents cannot be modified.

Supplying this tuple speeds up the work of the cursor because it does not have to deal with the potentially dozens of fields included in your dataset. In the example above, the tuple contains just one field, populationField. A tuple with just one item in it contains a comma after the item, therefore our tuple above looks like this: (populationField,). If the tuple were to have multiple items in it, we might see something like: (populationField, nameField).

Notice that with arcpy.da, you use row objects like with arcpy; however, you do not use the getValue method to retrieve values out of the rows. Instead, you use the index position of the field name in the tuple you submitted when you created the object. Since the above example submits only one item in the tuple, then the index position of populationField within that tuple is 0 (remember that we start counting from 0 in Python). Therefore, you can use row[0] to get the value of populationField for a particular row.

Since most students these days use ArcGIS versions 10.1 or higher, the examples from this point forward use the arcpy data access module. It's worth the effort to focus on arcpy.da functions because it will make your code faster, more compact, and more robust. You are required to use arcpy.da in your Lesson 3 project code unless you are running on a version prior to ArcGIS 10.1 and you have previously cleared this arrangement with the instructor.

3.2.3 Retrieving records using an attribute query

The previous examples used the SearchCursor object to read through each record in a dataset. You can get more specific with the search cursor by instructing it to retrieve just the subset of records whose attributes comply with some criteria, for example, "only records with a population greater than 10000" or "all records beginning with the letters P – Z."

For review, this is how you construct a search cursor to operate on every record in a datasets using the arcpy.da module (version 10.1 or higher):

with arcpy.da.SearchCursor(featureClass,(populationField,)) as cursor:

If you want the search cursor to retrieve only a subset of the records based on some criteria, you can supply a SQL expression as the third argument in the constructor (the constructor is the method that creates the SearchCursor). For example:

with arcpy.da.SearchCursor(featureClass, (populationField,), '"POP2008" > 100000') as cursor:

The above example uses the SQL expression "POP2008" > 100000 to retrieve only the records whose population is greater than 100000. SQL stands for "Structured Query Language" and is a special syntax used for querying datasets. If you've ever used a Definition Query to filter data in ArcMap, then you've had some exposure to these sorts of SQL queries. If SQL is new to you, please take a few minutes right now to read Building a query expression in the ArcGIS Desktop Help. This topic is a simple introduction to SQL in the context of ArcGIS.

SQL expressions can contain a combination of criteria allowing you to pinpoint a very focused subset of records. The complexity of your query is limited only by your available data. For example, you could use a SQL expression to find only states with a population density over 100 people per square mile that begin with the letter M and were settled after 1850.

Note that the SQL expression you supply for a search cursor is for attribute queries, not spatial queries. You could not use a SQL expression to select records that fall "west of the Mississippi River," or "inside the boundary of Canada" unless you had previously added and populated some attribute stating whether that condition were true (for example, REGION = 'Western' or CANADIAN = True). Later in this lesson we'll talk about how to make spatial queries using the Select By Location geoprocessing tool.

Once you retrieve the subset of records, you can follow the same pattern of iterating through them using a for loop.

with arcpy.da.SearchCursor(featureClass, (populationField,), '"POP2008" > 100000') as cursor:
    for row in cursor:
        print (str(row[0]))

Handling quotation marks

When you include a SQL expression in your SearchCursor constructor, you must supply it as a string. This is where things can get tricky with quotation marks. SQL requires single and double quotes in specific places, but you also need to enclose the entire expression with quotes because it is a string. How do you keep from getting confused?

In Python you can use either single quotes or double quotes to enclose a string. You may have noticed that in the above example, I enclosed the SQL expression in single quotes, not double quotes: '"POP2008" > 100000'.  Because I knew the double quotes were going to be required in the SQL statement (to surround the field name), I used single quotes to surround the entire string. This is not just to keep things easy to read; if I had used two double quotes in a row the Python interpreter would get confused and think I was creating an empty string. Therefore, it was not an option to use ""POP2008" > 100000".

The situation gets more difficult when your SQL expression must use both single and double quotes, for instance, when you query for a string variable. Suppose your script allows the user to enter the ID of a parcel and you need to find it with a search cursor. Some of the parcel IDs include letters and others don't, therefore you need to always treat the parcel ID as a string. Your SQL expression would probably look like this: "PARCEL" = 'A2003KSW'. This expression starts with double quotes and ends with single quotes, so which style of quotes do you use to enclose the entire expression?

In this case, you cannot simply enclose the entire expression in one style of quotes; you need to break up the expression into separate strings. Take a close look at this example:

ID = arcpy.GetParameterAsText(0)
whereClause = '"Parcel"' + " = '" + str(ID) + "'"
with arcpy.da.SearchCursor(featureClass, (parcelField,), whereClause) as cursor:
   ...

In the code above, the whereClause, or the SQL expression, is created in manageable chunks that don't mix single and double quotes. If the piece of the expression contains double quotes, such as "Parcel", it is enclosed in single quotes. If the piece of the expression contains single quotes, such as =' or just ', it is enclosed in double quotes. This type of situation is where it may be helpful to temporarily include a print statement in your code or use the debugging tools to make sure your whereClause is constructed correctly. It can be helpful if you can train your eye to focus on the + sign as a separator between all the independent pieces of the string you are constructing.

Field delimiters

In the examples above, field names are surrounded with double quotes (for example, "STATE_NAME"). This is correct syntax for shapefiles and file geodatabases, which are the only data types we'll use in this course. If you use personal geodatabases in your daily work, there are different ways to delimit the field name. If you're interested in the correct syntax for different data types, or ways to make your script flexible for any data type, take a look at the topic SQL reference for query expressions used in ArcGIS in the ArcGIS Desktop Help.

3.2.4 Retrieving records using a spatial query

Applying a SQL expression to the search cursor is only useful for attribute queries, not spatial queries. For example, you can easily open a search cursor on all counties named "Lincoln" using a SQL expression, but finding all counties that touch or include the Mississippi River requires a different approach. To get a subset of records based on a spatial criterion, you need to use the geoprocessing tool Select Layer By Location.

Note:

A few relational databases such as SQL Server 2008 expose spatial data types that can be spatially queried with SQL. Support for these spatial types in ArcGIS is still maturing, and in this course, we will assume that the way to make a spatial query is through Select Layer By Location. Since we are not using ArcSDE, this is actually true.

Here's where you need to know a little bit about how ArcGIS works with layers and selections. Suppose you want to select all states whose boundaries touch Wyoming. In most cases, you won't need to create an entirely new feature class to hold those particular states; you probably only need to maintain those particular state records in the computer's memory for a short time while you update some attribute. ArcGIS uses the concept of feature layers to represent in-memory sets of records from a feature class.

The Make Feature Layer tool creates a feature layer from some or all of the records in a feature class. You can apply a SQL expression when you run Make Feature Layer to narrow down the records included in the feature layer based on attributes. You can subsequently use Select Layer By Location to narrow down the records in the feature layer based on some spatial criteria.

Opening a search cursor on Wyoming and all states bordering it would take four steps:

  1. Use Make Feature Layer to make a feature layer of all US States. Let's call this the All States layer.
  2. Use Make Feature Layer to create a second feature layer of just Wyoming. (To get Wyoming alone, you would apply an SQL expression when making the feature layer.) Let's call this the Selection State layer.
  3. Use Select Layer By Location to narrow down the All States layer (the layer you created in Step 1) to just those states that touch the Selection State layer.
  4. Open a search cursor on the All States layer. The cursor will include only Wyoming and the states that touch it, because there is a selection applied to the All States layer. Remember that the feature layer is just a set of records held in memory. Even if you called it the All States layer, it no longer includes all states once you apply a selection.

Below is some code that applies the above steps.

# Selects all states whose boundaries touch
#  a user-supplied state

import arcpy

# Get the US States layer, state, and state name field
usaLayer = "D:\Data\USA\USA.gdb\Boundaries"
state = "Wyoming"
nameField = "NAME"

try:
    # Make a feature layer with all the US States
    arcpy.MakeFeatureLayer_management(usaLayer, "AllStatesLayer")

    # Make a feature layer containing only the state of interest
    arcpy.MakeFeatureLayer_management(usaLayer,
                        "SelectionStateLayer",
                        '"' + str(nameField) + '" =' + "'" + str(state) + "'")

    # Apply a selection to the US States layer
    arcpy.SelectLayerByLocation_management("AllStatesLayer","BOUNDARY_TOUCHES","SelectionStateLayer")

    # Open a search cursor on the US States layer
    with arcpy.da.SearchCursor("AllStatesLayer", (nameField,)) as cursor:
        for row in cursor:
            # Print the name of all the states in the selection
            print (row[0])
     
except:
    print (arcpy.GetMessages())

finally:
    # Clean up feature layers and cursor
    arcpy.Delete_management("AllStatesLayer")
    arcpy.Delete_management("SelectionStateLayer")
    del cursor

You can choose from many spatial operators when running SelectLayerByLocation. The code above uses "BOUNDARY_TOUCHES". Other available relationships are "INTERSECT", "WITHIN A DISTANCE" (may save you a buffering step), "CONTAINS", "CONTAINED_BY", and others.

Note that the Row object "row" returns only one field ("NAME"), which is accessed using its index position in the list of fields. Since there's only one field, that index is 0, and the syntax looks like this: row[0]. Once you open the search cursor on your selected records, you can perform whatever action you want on them. The code above just prints the state name, but more likely you'll want to summarize or update attribute values. You'll learn how to write attribute values later in this lesson.

Cleaning up feature layers and cursors

Notice that the feature layers are deleted using the Delete tool. This is because feature layers can maintain locks on your data, preventing other applications from using the data until your script is done. Arcpy is supposed to clean up feature layers at the end of the script, but it's a good idea to delete them yourself in case this doesn't happen or in case there is a crash. In the examples above, the except block will catch a crash, then the script will continue and run the Delete tool.

Cursors can also maintain locks on data.  As mentioned earlier, the "with" statement should clean up the cursor for you automatically.  However, we've found that it doesn't always, an observation that appears to be backed up by this blurb from Esri's documentation of the arcpy.da.SearchCursor class:

Search cursors also support with statements to reset iteration and aid in removal of locks. However, using a del statement to delete the object or wrapping the cursor in a function to have the cursor object go out of scope should be considered to guard against all locking cases.

One last point to note about this code that cleans up the feature layers and cursor is that it is embedded within a finally block.  This is a construct that is used occasionally with try and except to define code that should be executed regardless of whether the statements in the try block run successfully.  To understand the usefulness of finally, imagine if you had instead placed these cleanup statements at the end of the try block.  If an error were to occur somewhere above that point in the try block -- not hard to imagine, right? -- the remainder of the try block would not be executed, leaving the feature layers and cursor in memory.  A subsequent run of the script, after fixing the error, would encounter a new problem: the script would be unable to create the feature layer called "AllStatesLayer" because it already exists.  In other words, the cleanup statements would only run if the rest of the script ran successfully.  

This situation is where the finally statement is especially helpful.  Code in a finally block will be run regardless of whether something in the try block triggers a crash.  (In the event that an error is encountered, the finally code will be executed after the except code.)  Thus, as you develop your own code utilizing feature layers and/or cursors, it's a good idea to include these cleanup statements in a finally block.

Required reading

Before you move on, examine the following tool reference pages. You can ignore the Command Line Syntax section, but pay particular attention to the Usage Tips and the Script Examples.

3.3 Writing vector attribute data

In the same way that you use cursors to read vector attribute data, you use cursors to write data as well. Two types of cursors are supplied for writing data:

  • Update cursor - This cursor edits values in existing records or deletes records
  • Insert cursor - This cursor inserts new records

In the following sections, you'll learn about both of these cursors and get some tips for using them.

Required reading

The ArcGIS Desktop Help has some explanation of cursors. Get familiar with this help now, as it will prepare you for the next sections of the lesson. You'll also find it helpful to return to the code examples while working on Project 3:

Accessing data using cursors

Also follow the three links in the table at the beginning of the above topic. These briefly explain the InsertCursor, SearchCursor, and UpdateCursor and provide a code example for each. You've already worked with SearchCursor, but closely examine the code examples for all three cursor types and see if you can determine what is happening in each.

3.3.1 Updating existing records

Use the update cursor to modify existing records in a dataset. Here are the general steps for using the update cursor:

  1. Create the update cursor by calling arcpy.da.UpdateCursor(). You can optionally pass in an SQL expression as an argument to this method. This is a good way to narrow down the rows you want to edit if you are not interested in modifying every row in the table.
  2. Use a for loop to iterate through the rows and for each row ...
  3. Modify the field values in the row that need updating (see below).
  4. Call UpdateCursor.updateRow() to finalize the edit.

Modifying field values

When you create an UpdateCursor and iterate through the rows using a variable called row, you can modify the field values by making assignments using the syntax row[<index of the field you want to change>] = <the new value>. For example:

row[0] = "Trisha Stevens"

It is important to note that the index occurring in the [...] to determine which field will be changed is given with respect to the tuple of fields provided when the UpdateCursor is created. For instance, if we create the cursor using the following command

with arcpy.da.UpdateCursor(featureClass, ("Company name", "Owner")) as cursor:

row[0] would refer to the field called "Company name" and row[1] refers to the field that has the name "Owner".

Example

The script below performs a "search and replace" operation on an attribute table. For example, suppose you have a dataset representing local businesses, including banks. One of the banks was recently bought out by another bank. You need to find every instance of the old bank name and replace it with the new name. This script could perform that task automatically.

#Simple search and replace script
import arcpy

# Retrieve input parameters: the feature class, the field affected by
#  the search and replace, the search term, and the replace term.
fc = arcpy.GetParameterAsText(0)
affectedField = arcpy.GetParameterAsText(1)
oldValue = arcpy.GetParameterAsText(2)
newValue = arcpy.GetParameterAsText(3)

# Create the SQL expression for the update cursor. Here this is
#  done on a separate line for readability.
queryString = '"' + affectedField + '" = ' + "'" + oldValue + "'"

# Create the update cursor and update each row returned by the SQL expression
with arcpy.da.UpdateCursor(fc, (affectedField,), queryString) as cursor:
    for row in cursor:
        row[0] = newValue
        cursor.updateRow(row)

del row, cursor

Notice that this script is relatively flexible because it gets all the parameters as text. However, this script can only be run on string variables because of the way the query string is set up. Notice that the old value is put in quotes, like this: "'" + oldValue + "'". Handling other types of variables, such as integers, would have made the example longer.

Again, it is critical to understand the tuple of affected fields that you pass in when you create the update cursor. In this example, there is only one affected field (which we named affectedField), so its index position is 0 in the tuple. Therefore, you set that field value using row[0] = newValue. Cursor cleanup is not required at the end of the script because this is accomplished through the "with" statement.

The last line with updateRow(...) is needed to make sure that the modified row is actually written back to the attribute table. Please note that the variable row needs to be passed as a parameter to updateRow(...).

Dataset locking again

As we mentioned, ArcGIS sometimes places locks on datasets to avoid the possibility of editing conflicts between two users. If you think for any reason that a lock from your script is affecting your dataset (by preventing you from viewing it, making it look like all rows have been deleted, and so on), you must close PythonWin to remove the lock. If you think that ArcGIS has a lock on your data, check to see if ArcMap or ArcCatalog are using the data in any way. This could possibly occur through having an open edit session on the data, having the data open in the Preview tab in ArcCatalog, or having the layer in the table of contents in an open map document (MXD).

As we stated, cursor cleanup should happen through the creation of the cursor inside a "with" statement, but adding lines to delete the row and cursor objects will make extra sure locks are released.

For the Esri explanation of how locking works, you can review the section "Cursors and locking" in the topic Accessing data using cursors in the ArcGIS Desktop Help.

3.3.2 Inserting new records

When adding a new record to a table, you must use the insert cursor. Here's the workflow for insert cursors:

  • Create the insert cursor using arcpy.da.InsertCursor().
  • Call InsertCursor.insertRow() to add a new row to the dataset.

As with the search and update cursor, you can use an insert cursor together with the "with" statement to avoid locking problems.

Insert cursors differ from search and update cursors in that you cannot provide an SQL expression when you create the insert cursor. This makes sense because an insert cursor is only concerned with adding records to the table. It does not need to "know" about the existing records or any subset thereof.

When you insert a row using InsertCursor.insertRow(), you provide a comma-delimited tuple of values for the fields of the new row. The order of these values must match the order of values of the tuple of affected fields you provided when you created the cursor. For example, if you create the cursor using

with arcpy.da.InsertCursor(featureClass, ("First name","Last name")) as cursor:

you would add a new row with values "Sam" for "First name" and "Fisher" for "Last name" by the following command:

cursor.insertRow(("Sam","Fisher"))

Please note that the inner parentheses are needed to turn the values into a tuple that is passed to insertRow(). Writing cursor.insertRow("Sam","Fisher") would have resulted in an error.

Example

The example below uses an insert cursor to create one new point in the dataset and assign it one attribute: a string description. This script could potentially be used behind a public-facing 311 application, in which members of the public can click a point on a Web map and type a description of an incident that needs to be resolved by the municipality, such as a broken streetlight.

# Adds a point and an accompanying description
import arcpy 
# Retrieve input parameters
inX = arcpy.GetParameterAsText(0)
inY = arcpy.GetParameterAsText(1)
inDescription = arcpy.GetParameterAsText(2)

# These parameters are hard-coded. User can't change them.
incidentsFC = "C:/Data/Yakima/Incidents.shp"
descriptionField = "DESCR"

# Make a tuple of fields to update
fieldsToUpdate = ("SHAPE@XY", descriptionField)
# Create the insert cursor
with arcpy.da.InsertCursor(incidentsFC, fieldsToUpdate) as cursor:
    # Insert the row providing a tuple of affected attributes
    cursor.insertRow(((float(inX),float(inY)), inDescription))

del cursor

Take a moment to ensure that you know exactly how the following is done in the code:

  • The creation of the insert cursor with a tuple of affected fields stored in variable fieldsToUpdate
  • The insertion of the row through the insertRow() method using variables that contain the field values of the new row

If this script really were powering an interactive 311 application, the X and Y values could be derived from a point a user clicked on the Web map rather than as parameters as done in lines 5 and 6.

One thing you might have noticed is that the string "SHAPE@XY" is used to specify the Shape field. You might expect that this would just be "Shape," but arcpy.da provides a list of "tokens" that you can use if the field will be specified in a certain way. In our case, it would be very convenient just to provide the X and Y values of the points using a tuple of coordinates. It turns out that the token "SHAPE@XY" allows you to do just that. See help topic for InsertCursor to learn about other tokens you can use.

Putting this all together, the example creates a tuple of affected fields: ("SHAPE@XY", "DESCR"). Notice that in line 14, we actually use the variable descriptionField which contains the name of the second column "DESCR" for the second element of the tuple. Using a variable to store the name of the column we are interested in allows us to easily adapt the script later, for instance to a data set where the column has a different name. When the row is inserted, the values for these items are provided in the same order: cursor.insertRow(((float(inX), float(inY)), inDescription)). The argument passed to insertRow() is a tuple that has another tuple (inX,inY), namely the coordinates of the point cast to floating point numbers, as the first element and the text for the "DESCR" field as the second element.

Readings

Take a few minutes to read Zandbergen 7.1 - 7.3 to reinforce your learning about cursors.

3.4 Working with rasters

So far in this lesson, your scripts have only read and edited vector datasets. This work largely consists of cycling through tables of records and reading and writing values to certain fields. Raster data is very different, and consists only of a series of cells, each with its own value. So how do you access and manipulate raster data using Python?

It's unlikely that you will ever need to cycle through a raster cell by cell on your own using Python, and that technique is outside the scope of this course. Instead, you'll most often use predefined tools to read and manipulate rasters. These tools have been designed to operate on various types of rasters and perform the cell-by-cell computations so that you don't have to.

In ArcGIS, most of the tools you'll use when working with rasters are in either the Data Management > Raster toolset or the Spatial Analyst toolbox. These tools can reproject, clip, mosaic, and reclassify rasters. They can calculate slope, hillshade, and aspect rasters from DEMs.

The Spatial Analyst toolbox also contains tools for performing map algebra on rasters. Multiplying or adding many rasters together using map algebra is important for GIS site selection scenarios. For example, you may be trying to find the best location for a new restaurant and you have seven criteria that must be met. If you can create a boolean raster (containing 1 for suitable, 0 for unsuitable) for each criterion, you can use map algebra to multiply the rasters and determine which cells receive a score of 1, meeting all the criteria. (Alternatively you could add the rasters together and determine which areas received a value of 7.) Other courses in the Penn State GIS certificate program walk through these types of scenarios in more detail.

The tricky part of map algebra is constructing the expression, which is a string stating what the map algebra operation is supposed to do. ArcGIS Desktop contains interfaces for constructing an expression for one-time runs of the tool. But what if you want to run the analysis several times, or with different datasets? It's challenging even in ModelBuilder to build a flexible expression into the map algebra tools. With Python, you can manipulate the expression as much as you need.

Example

Examine the following example, which takes in a minimum and maximum elevation value as parameters, then does some map algebra with those values. The expression isolates areas where the elevation is greater than the minimum parameter and less than the maximum parameter. Cells that satisfy the expression are given a value of 1 by the software, and cells that do not satisfy the expression are given a value of 0.

But what if you don't want those 0 values cluttering your raster? This script gets rid of the 0's by running the Reclassify tool with a real simple remap table stating that input raster values of 1 should remain 1. Because 0 is left out of the remap table, it gets reclassified as NoData:

# This script takes a DEM, a minimum elevation,
#  and a maximum elevation. It outputs a new
#  raster showing only areas that fall between
#  the min and the max

import arcpy
from arcpy.sa import *
arcpy.env.overwriteOutput = True
arcpy.env.workspace = "C:/Data/Elevation"

# Get parameters of min and max elevations
inMin = arcpy.GetParameterAsText(0)
inMax = arcpy.GetParameterAsText(1)

arcpy.CheckOutExtension("Spatial")

# Perform the map algebra and make a temporary raster
inDem = Raster("foxlake")
tempRaster = (inDem > int(inMin)) & (inDem < int(inMax))

# Set up remap table and call Reclassify, leaving all values not 1 as NODATA
remap = RemapValue([[1,1]])
remappedRaster = Reclassify(tempRaster, "Value", remap, "NODATA")

# Save the reclassified raster to disk
remappedRaster.save("foxlake_recl")

arcpy.CheckInExtension("Spatial")

Read the example above carefully, as many times as necessary for you to understand what is occurring in each line. Notice the following things:

  • There is one intermediate raster (in other words, not the final output) that you don't want to have cluttering the output directory. This is referred to as tempRaster in the script. You'll see this temporary raster appear in ArcCatalog after you run the script, but it goes away after you close PythonWin.
  • Notice the expression contains > and < signs, as well as the & operator. You have to enclose each side of the expression in parentheses to avoid confusing the & operator.
  • Because you used arcpy.GetParameterAsText() to get the input parameters, you have to cast the input to an integer before you can do map algebra with it. If we just used inMin, the software would see "3000," for example, and would try to interpret that as a string. To do the numerical comparison, we have to use int(inMin). Then the software sees the number 3000 instead of the string "3000."
  • Map algebra can perform many types of math and operations on rasters, not limited to "greater than" or "less than." For example, you can use map algebra to find the cosine of a raster.
  • If you're working at a site where a license manager restricts the Spatial Analyst extension to a certain number of users, you must check out the extension in your script, then check it back in. Notice the calls to arcpy.CheckOutExtension() and arcpy.CheckInExtension(). You can pass in other extensions besides "Spatial" as arguments to these methods.
  • Notice that the script doesn't call these spatial analyst functions using arcpy. Instead, it imports functions from the spatial analyst module (from arcpy.sa import *) and calls the functions directly. For example, we don't see arcpy.Reclassify(); instead, we just call Reclassify() directly. This can be confusing for beginners (and old pros) so be sure to check the Esri samples closely for each tool you plan to run. Follow the syntax in the samples and you'll usually be safe.
  • See the Remap classes topic in the help to understand how the remap table in this example was created. Whenever you run Reclassify, you have to create a remap table stating how the old values should be reclassified to new values. This example has about the simplest remap table possible, but if you want a more complex remap table, you'll need to study the documentation.

Rasters and file extensions

The above example script doesn't use any file extensions for the rasters. This is because the rasters use the Esri GRID format, which doesn't use extensions. If you have rasters in another format, such as .jpg, you will need to add the correct file extension. If you're unsure of the syntax to use when providing a raster file name, highlight the raster in ArcCatalog and note how the path appears in the Location bar.

If you look at rasters such as an Esri GRID in Windows Explorer, you may see that they actually consist of several supporting files with different extensions, sometimes even contained in a series of folders. Don't try to guess one of the files to reference; instead, use ArcCatalog to get the path to the raster. When you use this path, the supporting files and folders will work together automatically.

Screen capture to show getting the path to a raster in ArcCatalog 10. Location: C:\Data\Elevation\foxlake
Figure 3.4 When using ArcCatalog for the path to a raster, the supporting files and folders work together automatically.

Readings

Zandbergen chapter 9 covers a lot of additional functions you can perform with rasters and has some good code examples. You don't have to understand everything in this chapter, but it might give you some good ideas for your final project.

Lesson 3 Practice Exercises

Lessons 3 and 4 contain practice exercises that are longer than the previous practice exercises and are designed to prepare you specifically for the projects. You should make your best attempt at each practice exercise before looking at the solution. If you get stuck, study the solution until you understand it.

Don't spend so much time on the practice exercises that you neglect Project 3. However, successfully completing the practice exercises will make Project 3 much easier and quicker for you.

Data for Lesson 3 practice exercises

The data for the Lesson 3 practice exercises is very simple and, like some of the Project 2 practice exercise data, was derived from Washington State Department of Transportation datasets. Download the data here.

Using the discussion forums

Using the discussion forums is a great way to work towards figuring out the practice exercises. You are welcome to post blocks of code on the forums relating to these exercises.

When completing the actual Project 3, avoid posting blocks of code longer than a few lines. If you have a question about your Project 3 code, please email the instructor, or you can post general questions to the forums that don't contain more than a few lines of code.

Getting ready

If the practice exercises look daunting to you, you might start by practicing with your cursors a little bit using the sample data:

  • Try to loop through the CityBoundaries and print the name of each city.
  • Try using an SQL expression with a search cursor to print the OBJECTIDs of all the park and rides in Chelan county (notice there is a County field that you could put in your SQL expression).
  • Use an update cursor to find the park and ride with OBJECTID 336. Assign it a ZIP code of 98512.

You can post thoughts on the above challenges on the forums

Lesson 3 Practice Exercise A

In this practice exercise, you will programmatically select features by location and update a field for the selected features. You'll also use your selection to perform a calculation.

In your Lesson3PracticeExerciseA folder, you have a Washington geodatabase with two feature classes:

  • CityBoundaries - Contains polygon boundaries of cities in Washington over 10 square kilometers in size.
  • ParkAndRide - Contains point features representing park and ride facilities where commuters can leave their cars.

The objective

You want to find out which cities contain park and ride facilities and what percentage of cities have at least one facility.

  • The CityBoundaries feature class has a field "HasParkAndRide," which is set to "False" by default. Your job is to mark this field "True" for every city containing at least one park and ride facility within its boundaries.
  • Your script should also calculate the percentage of cities that have a park and ride facility and print this figure for the user.

You do not have to make a script tool for this assignment. You can hard-code the variable values. Try to group the hard-coded string variables at the beginning of the script.

For the purposes of these practice exercises, assume that each point in the ParkAndRide dataset represents one valid park and ride (ignore the value in the TYPE field).

Tips

You can jump into the assignment at this point, or read the following tips to give you some guidance.

  • Make two feature layers: "CitiesLayer" and "ParkAndRideLayer."
  • Use SelectLayerByLocation with a relationship type of "CONTAINS" to narrow down your cities feature layer list to only the cities that contain park and rides.
  • Create an update cursor for your now narrowed-down "CitiesLayer" and loop through each record, setting the HasParkAndRide field to "True."
  • To calculate the percentage of cities with park and rides, you'll need to know the total number of cities. You can use the GetCount tool to get a total without writing a loop. Beware that you may have to monkey around with the output a bit to get it in a format you can use. See the example in the Esri documentation that converts the GetCount result to an integer.
  • Similarly, you may have to play around with your Python math a little to get a nice percentage figure. Don't get too hung up on this part.

Lesson 3 Practice Exercise A Solution

Below is one possible solution to Practice Exercise A with comments to explain what is going on. If you find a more efficient way to code a solution, please share it through the discussion forums. Please note that in order to make the changes to "CitiesLayer" permanent, you have to write the layer back to disk using the arcpy.CopyFeatures_management(...) function. This is not shown in the solution here.

# This script determines the percentage of cities in the
#  state with park and ride facilities

import arcpy
arcpy.env.overwriteOutput = True

cityBoundaries = "D:\\Data\\Geog485\\Lesson3PracticeExerciseA\\Washington.gdb\\CityBoundaries"
parkAndRide = "D:\\Data\\Geog485\\Lesson3PracticeExerciseA\\Washington.gdb\\ParkAndRide"
parkAndRideField = "HasParkAndRide"   # Name of column with Park & Ride information
citiesWithParkAndRide = 0             # Used for counting cities with Park & Ride

try:
    # Make a feature layer of all the park and ride facilities
    arcpy.MakeFeatureLayer_management(parkAndRide, "ParkAndRideLayer")

    # Make a feature layer of all the cities polygons   
    arcpy.MakeFeatureLayer_management(cityBoundaries, "CitiesLayer")

except:
    print ("Could not create feature layers")

try:      
    # Narrow down the cities layer to only the cities that contain a park and ride
    arcpy.SelectLayerByLocation_management("CitiesLayer", "CONTAINS", "ParkAndRideLayer")

    # Create an update cursor and loop through the selected records
    with arcpy.da.UpdateCursor("CitiesLayer", (parkAndRideField,)) as cursor:
        for row in cursor:
            # Set the park and ride field to TRUE and keep a tally
            row[0] = "True"
            cursor.updateRow(row)
            citiesWithParkAndRide +=1
   
# Delete the feature layers even if there is an exception (error) raised
finally: 
    arcpy.Delete_management("ParkAndRideLayer")
    arcpy.Delete_management("CitiesLayer")
    del row, cursor
    
# Count the total number of cities (this tool saves you a loop)
numCitiesCount = arcpy.GetCount_management(cityBoundaries)
numCities = int(numCitiesCount.getOutput(0))

# Calculate the percentage and print it for the user
percentCitiesWithParkAndRide = ((1.0 * citiesWithParkAndRide) / numCities) * 100

print (str(percentCitiesWithParkAndRide) + " percent of cities have a park and ride.")

Below is a video offering some line-by-line commentary on the structure of this solution:

Click for a transcript of "3A" video.

PRESENTER: This video shows one solution to Lesson 3, Practice Exercise A, which determines the percentage of cities in the State of Washington that have park and ride facilities. The script begins by importing the arcpy site package.

And then I put this line 5 to overwrite Output equals True, which is useful when you're writing scripts like these where you'll be doing some testing and you'll often want to overwrite the output files. Otherwise ArcGIS will return an error message saying that it cannot overwrite an output that already exists, so that's a tip to take from this code.

In lines 7 and 8, I have string variables that I'm creating that have the paths to the feature classes I'm going to work with, the cityBoundaries feature class and the ParkAndRide feature class, both from the Washington file geo database. I'm also going to be updating a field in the cityBoundaries feature class called HasParkAndRide. I'll be setting that to true or false. And so I make a variable for that in line 9. And you'll see that used later.

And then in line 10, I'm starting a counter that I'm going to use in my code to count the number of cities with a park and ride. And I start that off at zero.

In line 12, I move into it a try-except block of code. In this solution, I show you some examples of how you might do error handling and return messages. Usually if you leave these out, your script will return nice error messages to you anyways, that you can use to figure out what line has crashed. However, if you distribute your scripts to an end user, they will want to see nicer messages.

So the try and except blocks can help that. You can put your own custom messages in like I did in line 20. But for now we have a try block.

And we're going to try to make two feature layers. We'll make a feature layer that has all the park and rides, and a feature layer that has all of the city boundaries. And for both of these, in line 14 and 17, the MakeFeatureLayer tool has two parameters that we supply, the name of the feature class that's going to act as the feature layer and then the second parameter is a name that we will use to refer to this feature layer throughout this script and only this script, so this is a name that is in the computer's memory. For example, in line 14 for the park and ride layer, we just call it ParkAndRideLayer. We can name that part of it anything we want.

So we've got these two feature layers that we can now perform selections on. And in line 24, we're going to do a location selection to select all cities that contain features from the park and ride layer. If we were to do this in ArcMap, which sometimes it's helpful to do to see what's going on, here I have the park and rides, symbolized by black dots, and the city boundaries that are grey lines with orange fill.

And in ArcMap, I can do the same thing where I select by location. And essentially, we're selecting features from the city boundaries that completely contain the layer of park and rides. You can see all those elements there. And then, this is the selection of cities that do have a park and ride facility. So line 24 is showing how you would do that with code.

And once you've got that selection, then you can then do things with it. So for example in ArcMap, if I wanted to work with these cities, I could open the attribute table. And I see that some are selected. And then I could just work on those selected rows to do something.

And so we're going to do that in Python by opening an update cursor. And the cursor is just going to work on that narrowed-down layer of cities. So it's important to note that in line 24 you started with the cities layer that had all the cities. And after you run line 24, that layer is only going to contain a selected subset of those cities. And that's what is going to be applied in your update cursor.

So this update cursor will not update everything. It will just update the cities that are currently selected. And it will only be able to update the fields that you specify in the second parameter. So where I've put ParkAndRideField and then a comma, that is a tuple, a Python tuple, with one item in it that represents the field I'm going to edit. And that is the HasParkAndRide field. So cast your minds back up here with line number 9. That's where I created that variable that I'm passing in down in line 27.

And so what I have is a cursor that can then go row by row. And I do that using a for loop, which is in line 28. And in line 30, I take that HasParkAndRide field and I set it to true.

Now, why do I say 0 there in square brackets? 0 is the index position of the field that I passed in in that tuple in line 27. So in line 27, I passed in a tuple with just one field, ParkAndRideField.

And because it has one thing in it, that's at index position 0. And that's why I say row 0. If I was updating three or four fields, I would have row and then the field index position in a bracket. It could be 1, 2, 3, and so on.

After I've made those changes, I need to call updateRow in order for them to be applied. And this is a mistake that a lot of beginners make. So in line 31, I'm calling updateRow.

And in line 32, I'm incrementing the counter that I'm keeping of the number of cities with a park and ride facility found. So I just add 1 to the counter. The plus equals 1 is a shortcut for saying take this number and add 1.

After looping through all those things, in line 35 I start a finally block, which would run whether or not the above code failed. And it cleans up those feature layers so they're not stuck in the computer's memory. I could put those in an except block perhaps, but I wanted to put them in finally because those need to be cleaned up whether or not the script crashes. So that's what's happening in line 36 and 37.

Now that I've done all those things, it's time to do some math to figure out the percentage of cities that have a park and ride facility. So in line 40, I'm running the GetCount tool in ArcGIS. You can run this tool directly on a feature class. So here, I don't pass in a feature layer. I just pass in the string path to a feature class.

Notice in 40 I referenced that cityBoundaries variable. That's a variable that I created up in line 7. And it just has the path to the data.

When you use the GetCount tool, you have to also call a getOutput method to get the actual result and convert it to an integer. Now this is a little complex. Don't worry if you don't understand it all. But just follow the examples that are in this code or also the ArcGIS Help code for the GetCount tool.

So what we have in the end, in line 41, is a variable called numCities that is the total number of cities in the entire data set. And then, we've already been keeping this counter of cities that have a park and ride. So to find the percentage, all we need to do is divide one by the other. And that's what's going on in line 44.

Now, there's a little extra things in the equation here. We take 1.0 and multiply it by the number of cities with a park and ride, because we want to do some decimal division. If we didn't do that, it would perform integer division by default. And we wouldn't get a nice percentage figure. So we take the cities with park and ride, divide it by the number of cities, and then we multiply it by 100 to get a nice interpretable percentage figure. And then, we can print out the result at the end of the script in line 46.

Lesson 3 Practice Exercise B

If you look in your Lesson3PracticeExerciseB folder, you'll notice the data is exactly the same as for Practice Exercise A...except, this time the field is "HasTwoParkAndRides."

The objective

In Practice Exercise B, your assignment is to find which cities have at least two park and rides within their boundaries.

  • Mark the "HasTwoParkAndRides" field as "True" for all cities that have at least two park and rides within their boundaries.
  • Calculate the percentage of cities that have at least two park and rides within their boundaries and print this for the user.

Tips

This simple modification in requirements is a game changer. Following is one way you can approach the task. Notice that it is very different from what you did in Practice Exercise A:

  • Create an update cursor for the cities and start a loop that will examine each city.
  • Make a feature layer with all the park and ride facilities.
  • Make a feature layer for just the current city. You'll have to make an SQL query expression in order to do this. Remember that an UpdateCursor can get values, so you can use it to get the name of the current city.
  • Use SelectLayerByLocation to find all the park and rides CONTAINED_BY the current city. Your result will be a narrowed-down park and ride feature layer. This is different from Practice Exercise A where you narrowed down the cities feature layer.
  • One approach to determine whether there are at least two park and rides is to run the GetCount tool to find out how many features were selected, then check if the result is 2 or greater. Another approach is to create a search cursor on the selected park and rides and see if you can count at least two cursor advancements.
  • Be sure to delete your feature layers before you loop on to the next city. For example: arcpy.Delete_management("ParkAndRideLayer")
  • Keep a tally for every row you mark "True" and find the average as you did in Practice Exercise A.

Lesson 3 Practice Exercise B Solution

Below is one possible solution to Practice Exercise B with comments to explain what is going on. If you find a more efficient way to code a solution, please share it through the discussion forums.

# This script determines the percentage of cities with two park
#  and ride facilities

import arcpy
arcpy.env.overwriteOutput = True

cityBoundaries = "D:\\Data\\Geog485\\Lesson3PracticeExerciseB\\Washington.gdb\\CityBoundaries"
parkAndRide = "D:\\Data\\Geog485\\Lesson3PracticeExerciseB\\Washington.gdb\\ParkAndRide"
parkAndRideField = "HasTwoParkAndRides"   # Name of column for storing the Park & Ride information
cityIDStringField = "CI_FIPS"             # Name of column with city IDs
citiesWithTwoParkAndRides = 0             # Used for counting cities with at least two P & R facilities
numCities = 0                             # Used for counting cities in total

# Make a feature layer of all the park and ride facilities
arcpy.MakeFeatureLayer_management(parkAndRide, "ParkAndRideLayer")

# Make an update cursor and loop through each city
with arcpy.da.UpdateCursor(cityBoundaries, (cityIDStringField, parkAndRideField)) as cityRows:
    for city in cityRows:
        # Create a query string for the current city    
        cityIDString = city[0]
        queryString = '"' + cityIDStringField + '" = ' + "'" + cityIDString + "'"

        # Make a feature layer of just the current city polygon    
        arcpy.MakeFeatureLayer_management(cityBoundaries, "CurrentCityLayer", queryString)

        try:
            # Narrow down the park and ride layer by selecting only the park and rides
            #  in the current city
            arcpy.SelectLayerByLocation_management("ParkAndRideLayer", "CONTAINED_BY", "CurrentCityLayer")

            # Count the number of park and ride facilities selected
            selectedParkAndRideCount = arcpy.GetCount_management("ParkAndRideLayer")
            numSelectedParkAndRide = int(selectedParkAndRideCount.getOutput(0))

            # If more than two park and ride facilities found, update the row to TRUE
            if numSelectedParkAndRide >= 2:
                city[1] = "TRUE"

                # Don't forget to call updateRow
                cityRows.updateRow(city)

                # Add 1 to your tally of cities with two park and rides                
                citiesWithTwoParkAndRides += 1

        finally:
            # Delete current cities layer to prepare for next run of loop
            arcpy.Delete_management("CurrentCityLayer")
            numCities +=1

# Clean up park and ride feature layer
arcpy.Delete_management("ParkAndRideLayer")
del city, cityRows

# Calculate and report the number of cities with two park and rides
if numCities <> 0:
    percentCitiesWithParkAndRide = ((1.0 * citiesWithTwoParkAndRides) / numCities) * 100
else:
    print ("Error with input dataset. No cities found.")

print (str(percentCitiesWithParkAndRide) + " percent of cities have two park and rides.")

The video below offers some line-by-line commentary on the structure of the above solution:

Click for a transcript of "3B" video.

PRESENTER: This video shows one possible solution to Lesson 3, Practice Exercise B, which calculates the number of cities in the data set with at least two park and ride facilities using selections in ArcGIS. It also uses an update cursor to update a field indicating whether the city has at least two park and rides. The beginning of this script is very similar to Practice Exercise A, so I won't spend a lot of time on that. Lines 4 through 8 are pretty familiar from that exercise.

In lines 9 and 10, I'm setting up variables to reference some fields that I'm going to use. Part of the exercise is to update a field called HasTwoParkAndRides to be either true or false. And so I'm setting a variable for that.

And in line 10, I set up a variable for the city FIPS code. We're going to be querying each city one by one, and we need some field with unique values for each city. We might use the city name, but that could be dangerous in the case that you have a data set where two cities happen to have the same name. So we'll use the FIPS code instead.

In lines 11 and 12, I'm setting up some counters that will be used to count the number of cities we run into that have at least two park and rides, and then the number of cities total respectively. In line 15, I'm starting out by making a feature layer of just the park and ride facilities. So this is familiar from the previous exercise, where the first parameter I pass in is the path to the park and ride data set and then the second parameter is a name that I give this feature layer. Just for the purposes of this script, I'll call it ParkAndRideLayer.

And in line 18, I'm opening an update cursor on all of the city boundaries. So I'm going to loop through each city here and select the number of facilities that occur in each city and count those up. And so I'm going to be working with two fields using this cursor. And I pass those in in a tuple. That's cityIDStringField and parkAndRideField. And those are the variables that I set up earlier in the script in lines 10 and 11.

It might be helpful at this point to take a look at what we're doing conceptually in ArcMap. So the first thing we're going to do is for each city make an attribute query for that city FIPS. And so for example if I were to do this for the city of Seattle, I've set up this. The city FIPS code is this long number here.

And if I were to query that, I have one city. So I'm going to narrow down my cities feature layer to just have one thing in it. And then I'm going to do the selection by location where I select park and ride facilities that are completely within the layer of city boundary. So doing that, I would get these points back. And then I could count those or I could start looping through them and see if I hit at least two in order to determine if the city has at least two park and ride facilities.

Then I would go on to the next city and the next city and so on until I had worked my way through the entire data set. So there is some processing that will occur as you run the script. It might take a few seconds to run.

All right. So we are back on line 18 where we set up the update cursor. We call it cityRows. And then we can loop through each city in this. So that's what is done in line 19 with a for loop.

In line 21, we get the FIPS code for the city that's currently being looked at by the cursor. We use 0 there because that's the index position of the CityIDStringField that we put in the tuple in line 18. It was the first thing in the tuple, so it has index position 0.

And in line 22, we're setting up a string to query for that city FIPS. And so we need the field name equals the city name. And the field name has to be in double quotes. The city name has to be in single quotes. So that involves a lot of string concatenation to get that just right as we set this up.

But once we have that string, we can make a feature layer that just has that one city within it. The way we do that is in line 25. And we call it MakeFeatureLayer. But instead of two parameters, we put in three.

The first two parameters are the same as when we made the feature layer for the park and rides. So we put in-- it's the same pattern. We put in the path to the data set. We put in a name for that feature layer. We're going to call this CurrentCityLayer.

But then the third parameter is that query string that we just created in line 22. And that causes our feature layer to just have the one city in it that's returned by that query.

And so now, we're going to do the locational selection in line 30 to select just the park and rides that are contained by that city. The parameters here should be somewhat intuitive if you have that conceptual understanding of how the selection is working, where just one city boundary is selecting a subset of park and rides.

At that point, you can call a GetCount tool and figure out how many park and rides were indeed selected. So that's what's going on in line 33. And remember with line 34 that you've got to call getOutput in order to get the result of the GetCount tool. You got exposed to that in the solution to Practice Exercise A.

So in the end, what you have in line 34 is this variable called numSelectedParkAndRide. That's the number of park and ride facilities that fell within that city.

And in line 37, you can do a check to see if that number is greater than or equal to two. If it is, then we're going to set the value to true. So that's where we, using our cursor, go to the first item in the tuple up here, actually it's the second item, but it has index position 1, and we set that equal to true. And we don't want to forget to call updateRow so that that gets applied to the data set.

And then in the end, we can update our counter of cities that have at least two park and rides.

Finally, after we're done doing all this looping, we delete the feature layer. We clean it up. And we increment the number of cities we've counted by one here. And so we do that for every single city in the data set.

And then we clean up the park and ride layer in line 52. So notice, there are some multiple levels of indentation here showing the if-then's the try and finally's and the loop that runs the cursor. And so we need to manage that carefully. If we go back and stop indenting, that means we're going out of the block of code, either out of the if statement or out of the try-catch loop and so on.

So after doing the cleanup, now we're going to calculate the percentage of cities that have at least two park and rides. Now, you can't divide by zero, so we're doing a check in line 55 for that. But if everything checks out OK, which it would unless we had a major problem with our data set, we go ahead in line 56 and we run some division to divide the number of cities with two park and rides by the total number of cities. In the video for Exercise A, I explain why we multiply by 1.0, so we can do some decimal division. And then, we multiply by 100 to get a nice percentage figure.

In line 57 and 58, we have a little bit of error handling which would occur if the input data set was invalid and there were no cities in it. Otherwise, we're going to go down to line 60 and print the percentage of cities with at least two park and rides. So give it a try.

Lesson 3 Practice Exercise C

This practice exercise uses the same starting data as Lesson 3 Practice Exercise A. It is designed to give you practice with extracting data based on an attribute query.

The objective

Select all park and ride facilities with a capacity of more than 500 parking spaces and put them into their own feature class. The capacity of each park and ride is stored in the "Approx_Par" field.

Tips

Use the MakeFeatureLayer tool to perform the selection. You will set up a SQL expression and pass it in as the third parameter for this tool.

Once you make the selection, use the Copy Features tool to create the new feature class. You can pass a feature layer directly into the Copy Features tool and it will make a new feature class from all features in the selection.

Remember to delete your feature layers when you are done.

Lesson 3 Practice Exercise C Solution

Below is one approach to Lesson 3 Practice Exercise C. The number of spaces to query is stored in a variable at the top of the script, allowing for easy testing with other values.

# Selects park and ride facilities with over a certain number of parking spots
#  and exports them to a new feature class using CopyFeatures

import arcpy
parkingSpaces = 500
arcpy.env.workspace = "C:\\data\\Geog485\\Lesson3PracticeExercises\\Lesson3PracticeExerciseC\\Washington.gdb"

# Set up the SQL expression to query the parking capacity
parkingQuery = '"Approx_Par" > ' + str(parkingSpaces)

# Make a feature layer of park and rides that applies the SQL expression
arcpy.MakeFeatureLayer_management("ParkAndRide", "ParkAndRideLayer", parkingQuery)

# Copy the features to a new feature class and clean up
arcpy.CopyFeatures_management("ParkAndRideLayer", "BigParkAndRideFacilities")
arcpy.Delete_management("ParkAndRideLayer")

The video below offers some line-by-line commentary on the structure of the above solution:

Click for a transcript of "3C" video.

PRESENTER: This video describes a solution to lesson 3 of practice exercise C, wherein we want to select park and ride facilities that meet a certain threshold of a minimum number of parking spaces, and then, copy them to their own new feature class.

After importing the arcpy site package in line 4, we set up the variable representing that threshold. So in this case, we set that equal to 500 parking spaces.

Putting the variable at the top of the script is helpful so that number is not buried down later in our code. And so, if we want to adjust this value and test with other values, it's easy to find.

In line 6, I'm setting up the arcpy workspace to be equal to my file geodatabase location. This is a little different from some of the things shown in the other practice exercise videos. But it's a nice way to work with file geodatabases, in the sense that, from now on, if you refer to a feature class, you only have to use its name. You don't have to put the entire path. And you'll see that occur later in this script.

Line 9 is probably the most critical line of this script. It's setting up the SQL query expression to get park and ride facilities that have greater than that number of parking spaces. So if you were to look at this in ArcMap, where the park and ride facilities are the black dots, we would do a Select By Attributes. And we query on that attribute Approx_Par is greater than 500. And that would give us those large park and rides that are primarily found in the Seattle and Tacoma areas.

Notice that you need to convert that integer of 500 into a string, just for the purpose of putting it into this query expression. However, you don't need to enclose it in any type of special quotes. The field name, you do have to put into double quotes, and that's why I've surrounded this part of the expression with single quotes, as you learned in your lesson materials.

Once you have that query string all set up, you can run MakeFeatureLayer to get a Park and Rides Feature Layer with just those selected park and rides that meet the criteria.

So there's three parameters to pass in here. The first one is the name of the feature class that were operating on. And again, because we set up the workspace, we can just put that we're working with the Park and Ride Feature Class.

The second parameter is the name of the feature layer. And we could name this anything here. It's just going to reside in the computer's memory, as long as we're consistent.

And then, the third parameter is that SQL query expression. This is optional. If we omitted this, we would just get everything. But we want to just have those park and ride facilities that meet the criteria.

Once we have that feature layer, we can run the copy features tool to make a brand new feature class with just these selected elements. So the two parameters here-- the first one is the name of the feature layer that we're working with. And, in line 12, remember that we called this Park and Ride Layer.

The second parameter, Big Park and Ride Facilities, is the name of the new feature class that we want to create. And so, once we've done that, we have a new feature class, and we can run the Delete tool to clean up our feature layer. And that's all we need to do in this script.

Lesson 3 Practice Exercise D

This practice exercise requires applying both an attribute selection and a spatial selection. It is directly applicable to Project 3 in many ways. The data is the same that you used in exercises A and C.

The objective

Write a script that selects all the park and ride facilities in a given city and saves them out to a new feature class. You can test with the city of 'Federal Way'.

Tips

Start by making a feature layer from the CityBoundaries feature class that contains just the city in question. You'll then need to make a feature layer from the park and ride feature class and perform a spatial selection on it using the "CONTAINED_BY" operation. Then use the Copy Features tool as in the previous lesson to move the selected park and ride facilities into their own feature class.

Lesson 3 Practice Exercise D Solution

Below is one possible approach to Lesson 3 Practice Exercise D. Notice that the city name is stored near the top of the script in a variable so that it can be tested with other values. The SQL expression takes a little bit of work to create due to the string query using two different types of quotes.

Note that the Python code colorizer below makes a mistake in line 11 and paints the variable targetCity green when it should be black.

# Selects park and ride facilities in a given target city and
#  exports them to a new feature class

import arcpy
targetCity = "Federal Way"     # Name of target city
arcpy.env.workspace = "C:\\data\\Geog485\\Lesson3PracticeExercises\\Lesson3PracticeExerciseD\\Washington.gdb"
parkAndRide = "ParkAndRide"    # Name of P & R feature class
cities = "CityBoundaries"      # Name of city feature class

# Set up the SQL expression of the query for the target city
cityQuery = '"NAME" = ' + "'" + targetCity + "'"

# Make feature layers for the cities and park and rides
arcpy.MakeFeatureLayer_management(cities, "CitiesLayer", cityQuery)
arcpy.MakeFeatureLayer_management(parkAndRide, "ParkAndRideLayer")

# Select all park and rides in the target city
arcpy.SelectLayerByLocation_management("ParkAndRideLayer", "CONTAINED_BY", "CitiesLayer")

# Copy the features to a new feature class and clean up
arcpy.CopyFeatures_management("ParkAndRideLayer", "TargetParkAndRideFacilities")

arcpy.Delete_management("ParkAndRideLayer")
arcpy.Delete_management("CitiesLayer")

See the video below for some line-by-line commentary on the above solution:

Click for a transcript of "3D" video.

PRESENTER: In this video, I'll be explaining one solution to Lesson 3 Practice Exercise D, in which we need to select park and ride facilities in a given target city and export them to a new feature class.

And this solution will have a lot of elements that were used in Exercises B and C. These patterns should start to look familiar to you.

In line 4, I import the Arcpy Site Package, and then in line 5, I set up a variable to represent the city on which I want to query for park and rides. I'm going to be doing an attribute query for this city, and then I'm going to follow it up with a spatial query to just get the park and rides that fall within that selected city. And putting line 5 at the top like this allows me to change the city so I can test with different values without hunting through my code and finding where that is.

In line 6, I set up the workspace, which is going to be my file geodatabase. This was a technique that I used also in Exercise C's solution, which is convenient, because I'll be working with several feature classes here. In lines 7 and 8, I decided to go ahead here and set up some variables that represent the names of those feature classes. This is not necessary. I could just plug in these strings into the functions in lines 14 and 15.

But sometimes, when I'm working with a lot of data sets, I like to set up everything at the beginning just so I can see exactly what I'm working with. And so, this is optional.

The rest of the code, some of it could go within try, accept, and finally blocks. For simplicity, I've left those out here, although for code quality, you're asked to include those in appropriate places in your project submissions.

What I'll do first here is go ahead in line 11 and set up the SQL query string to get just the city of Federal Way on its own. So I'm doing this attribute query here, similar to what was done in Exercise B. I'm querying on the name field for the city. The field name needs to go in double quotes in the string. So I'm enclosing the first part of the string in single quotes.

The name of the city has to go within single quotes in the query. So I have to do a little bit of string concatenation here and enclose each of those single quotes in double quotes, and stick everything together so I have that full query string. And once I have it, in line 14, I can make it a feature layer of just that target city. This will get just the city of Federal Way.

And by now, this pattern of making the feature layer and passing in the parameters should be familiar to you. The first parameter is the name of the feature class that I'm using. And remember, in line 8, I set up a variable for that. The second parameter is the name I want to give this feature layer throughout this script. I'm going to call it cities layer.

And then the third parameter is the optional SQL query expression, and it's what will narrow down the cities to just be one city, in this case. In line 15, I'm going to make a feature layer of all the park and ride facilities. And in line 15, I only pass in two parameters because I want all of the park and rides. So I'm not going to pass in any type of query string here.

So the two parameters are the variable I set up in line 7 that gets the park and ride feature class, and then the name I want to give to this feature layer in the script, which I just called it park and ride layer.

With all of those elements, I can now do a Select Layer by Location so that I can get just the park and rides that fall within the selected city. So I pass in the park and ride layer and the type of spatial relationship which I'm using, which is contained by, which you've seen in previous practice exercises.

And then I'm working with the cities layer to do the selection. So that's the third parameter. Once I have that selection made, then I can copy those selected features into a new feature class. And just like you saw in Practice Exercise C, I'm using the Copy Features tool, and I give it my park and ride feature layer.

And then the second parameter here is the name of the new feature class that will be created. It's going to go into that workspace, which is the Washington file geodatabase. And I'm going to call that feature class Target Park and Ride Facilities.

Now at the end of your code, probably within a Finally block or somewhere at the end, you're going to delete the feature layers to clean them up. And in this case, we have two feature layers to clean up.

And that's all it takes to complete this exercise.

Project 3: Extracting data for a pro hockey team's scouting department

In this project you'll use your new skills working with selections and cursors to process some data from a "raw" format into a more specialized dataset for a specific mapping purpose. The data from this exercise was retrieved from the National Hockey League's undocumented statistics API.

Download the data for this project

Background

In this exercise, suppose you are a data analyst for an NHL franchise.  In preparation for the league draft, the team general manager has asked you to make it possible for him to retrieve all current players born in a particular country (say, Sweden) broken down by position.  Your predecessor passed along to you the player shapefile, but unfortunately the player's birth country is not included as one of the attributes.  You do have a second shapefile of world country boundaries though...

Task

Write a script that makes a separate shapefile for each of the three forward positions (center, right wing, and left wing) within the boundary of Sweden. Write this script so that the user can change the country or list of positions simply by editing a couple of lines of code at the top of the script.

In browsing the attribute table, you'll note that player heights and weights are stored in imperial units (feet & inches and pounds).  As part of this extraction task, to simplify comparisons against scouting reports written using metric units, you should also add two new numeric fields to the attribute table -- to the new shapefiles only, not the original nhlrosters.shp -- to hold height and weight values in centimeters and kilograms.  For every record, populate these fields with values based on the following formulas:

height_cm = height (in inches) * 2.54

weight_kg = weight (in pounds) * 0.453592

Your result should look something like this if viewed in ArcMap:

Graphic showing required Project 3 output

Figure 3.5  Example output from Project 3, viewed in ArcMap.

The above requirements are sufficient for receiving 90% of the credit on this assignment. The remaining 10% is reserved for "Over and above" efforts, such as making an ArcToolbox script tool, or extending the script to handle other positions, multiple countries, etc. For these over and above efforts, we prefer that you submit two copies of the script: one with the basic functionality and one with the extended functionality. This will make it more likely that you'll receive the base credit if something fails with your over and above coding.  

Deliverables

Deliverables for this project are as follows:

  • The source .py file containing your script
  • A short writeup (about 300 words) describing how you approached the project, how you successfully dealt with any roadblocks, and what you learned along the way. You should include which requirements you met, or failed to meet. If you added some of the "over and above" efforts, please point these out so the grader can look for them.

You do not have to create an ArcToolbox script tool for this assignment; you can hard-code the initial parameters. Nevertheless, put all the parameters at the top so they can be easily manipulated by whoever tests the script.

Once you get everything working, creating an ArcToolbox script tool is a good way to achieve the "over and above" credit for this assignment. If you do this, then please zip all supporting files before placing them in the drop box.

Notes about the data

Take a look at the provided datasets in ArcMap, particularly the attribute tables.  The nhlrosters shapefile contains a field called "position" that provides the values you need to examine (RW = Right Wing, LW = Left Wing, C = Center, D = Defenseman, G = Goaltender).  As mentioned, you've been asked to extract the forward positions (RW, LW, and C) to new shapefiles, but you want to write a script that's capable of handling some other combination of positions, too.  There are several other fields that offer the potential for interesting queries as well, if you're looking for over and above ideas.

The Countries_WGS84 shapefile has a field called "CNTRY_NAME". You can make an attribute selection on this field to select Sweden, then follow that up with a spatial selection to grab all the players that fall within this country. Finally, narrow down those players to just the ones that play the desired position.

Tips

Once you've selected Swedish players at the desired position, use the Copy Features tool to save the selected features into a new feature class, in the same fashion as in the practice exercises.

Take this project one step at a time. It's probably easiest to tackle the extraction-into-shapefile portion first.  Once you have all the new position shapefiles created, go through them one by one, use the "Add Field" tool to add the "height_cm" and "weight_kg" fields, followed by an UpdateCursor to loop through all rows and populate these new fields with appropriate values.

It might be easiest to get the whole process working with a single position, then add the loop for all the positions later after you have finalized all the other script logic.

The height field is of type Text to allow for the ' and " characters.  The string slicing notation covered earlier in the course can be used to obtain the feet and inches components of the player height.  Use the inches to centimeters formula shown above to compute the height in metric units.

For the purposes of this exercise, don't worry about capturing points that fall barely outside the edge of the boundary (e.g., points in coastal cities that appear in the ocean). To capture all these in real life, you would just need to obtain a higher resolution boundary file. The code would be the same.

You should be able to complete this exercise in about 50 lines of code (including whitespace and comments). If your code gets much longer than this, you are probably missing an easier way.