GEOG 485:
GIS Programming and Software Development

Lesson 3 Practice Exercise B Solution

PrintPrint

Below is one possible solution to Practice Exercise B with comments to explain what is going on. If you find a more efficient way to code a solution, please share it through the discussion forums.

# This script determines the percentage of cities with two park
#  and ride facilities

import arcpy

arcpy.env.overwriteOutput = True
arcpy.env.workspace = r"C:\PSU\geog485\L3\PracticeExerciseB\Washington.gdb"

cityBoundariesFC = "CityBoundaries"
parkAndRideFC = "ParkAndRide"
parkAndRideField = "HasTwoParkAndRides"   # Name of column for storing the Park & Ride information
cityIDStringField = "CI_FIPS"             # Name of column with city IDs
citiesWithTwoParkAndRides = 0             # Used for counting cities with at least two P & R facilities
numCities = 0                             # Used for counting cities in total

# Make an update cursor and loop through each city
with arcpy.da.UpdateCursor(cityBoundariesFC, (cityIDStringField, parkAndRideField)) as cityRows:
    for city in cityRows:
        # Create a query string for the current city    
        cityIDString = city[0]
        whereClause = cityIDStringField + " = '" + cityIDString + "'"
        print("Processing city " + cityIDString)
 
        # Make a feature layer of just the current city polygon    
        currentCityLayer = arcpy.SelectLayerByAttribute_management(cityBoundariesFC, "NEW_SELECTION", whereClause)
 
        try:
            # Narrow down the park and ride layer by selecting only the park and rides in the current city
            selectedParkAndRideLayer = arcpy.SelectLayerByLocation_management(parkAndRideFC, "CONTAINED_BY", currentCityLayer)
 
            # Count the number of park and ride facilities selected
            numSelectedParkAndRide = int(selectedParkAndRideLayer[2])
 
            # If more than two park and ride facilities found, update the row to TRUE
            if numSelectedParkAndRide >= 2:
                city[1] = "TRUE"
 
                # Don't forget to call updateRow
                cityRows.updateRow(city)
 
                # Add 1 to your tally of cities with two park and rides                
                citiesWithTwoParkAndRides += 1
            
            numCities += 1
        except:
            print("Problem determining number of ParkAndRides in " + cityIDString)
        finally:
            # Clean up feature layers 
            arcpy.Delete_management(selectedParkAndRideLayer) 
            arcpy.Delete_management(currentCityLayer)

del city, cityRows
 
# Calculate and report the number of cities with two park and rides
if numCities != 0:
    percentCitiesWithParkAndRide = (citiesWithTwoParkAndRides / numCities) * 100
    print (str(round(percentCitiesWithParkAndRide,1)) + " percent of cities have two park and rides.")
else:
    print ("Error with input dataset. No cities found.")
 

The video below offers some line-by-line commentary on the structure of the above solution:

Video: Solution to Lesson 3, Practice Exercise B (8:57)

Click here for transcript of Solution to Lesson 3, Practice Exercise B

This video shows one possible solution to Lesson 3, Practice Exercise B, which calculates the number of cities in the data set with at least two park and ride facilities using selections in ArcGIS. It also uses an update cursor to update a field indicating whether the city has at least two park and rides. The beginning of this script is very similar to Practice Exercise A, so I won't spend a lot of time on the beginning. Lines 4 through 10 are pretty familiar from that exercise.

In lines 11 and 12, I'm setting up variables to reference some fields that I'm going to use. Part of the exercise is to update a field called HasTwoParkAndRides to be either True or False. And so, I'm setting a variable to refer to that field.

And in line 12, I set up a variable for the city FIPS code. We're going to be querying each city one by one, and we need some field with unique values for each city. We might use the city name, but that could be dangerous in the event that you have a dataset where two cities happen to have the same name. So we'll use the FIPS code instead.

In lines 13 and 14, I'm setting up some counters that will be used to count the number of cities we run into that have at least two park and rides, and then the number of cities total respectively.

On line 17, I open an update cursor on the cities feature class. In specifying the fields tuple, I’m going to need the FIPS code, so I can create a Feature Layer for the city in each iteration of the loop through the cursor and I need the HasTwoParkAndRides field since that’s the field I want to set to True or False.

So I'm going to loop through each city here and select the number of facilities that occur in each city and count those up. And I'm going to be working with two fields using this cursor. And I pass those in a tuple. That's cityIDStringField and parkAndRideField. Those are the variables that I set up earlier in the script in lines 11 and 12.

It might be helpful at this point to take a look at what we're doing conceptually in ArcGIS Pro. So, the first thing we're going to do is for each city is make an attribute query for that city FIPS. And so, for example, if I were to do this for the city of Seattle, its FIPS code is this long number here.

If I were to run a query for the cities with that FIPS code, I’d end up with a selection containing just one city. The next step then is to select the ParkRide features that intersect the selected features in the cities layer. If that step selects 2 or more features, then I want to record True in my True/False field. If it’s 0 or 1, then I want to record False.

And I want to repeat this for each of the 108 cities. So, this is definitely a good candidate for automation.

So getting back to the script, I’ve called my cursor cityRows and set up a loop in which I called the iterator variable, or loop variable, city.

In line 20, we get the FIPS code for the city that's currently being looked at on the current iteration of the loop. We use 0 in square brackets there because that's the index position of the CityIDStringField that we put in the tuple in line 17. It was the first thing in the tuple, so it has index position 0.

And in line 22, we're setting up a string to store a query for that city FIPS. And so, we need the field name equals the city ID. The city ID has to be in single quotes so we use string concatenation to plug the field name and city ID into a string expression. Note that because this script takes a while to run, I also have a print statement here that includes the city ID to help track the script’s progress as it runs.

On line 25, I use SelectLayerByAttribute to create a Feature Layer representing just the current city. I store a reference to that layer in a variable called currentCityLayer.

I then use that layer in a call to the SelectLayerByLocation tool in which I select the ParkAndRide features that are contained by the feature in currentCityLayer. The result of that selection is stored in the selectedParkAndRideLayer variable.

As discussed in Exercise A, the third item in the Result object is the count of features in the Feature Layer. So on line 32, I get that count using the expression selectedParkAndRideLayer[2].

And then in line 35, I do a check to see if that number is greater than or equal to 2. If it is, then I’m going to assign a value of True to the ParkAndRide field. So that's where I look back at how I defined the cursor and see that the parkAndRideField was the second of the two fields in the fields tuple, so I need to use the expression city[1] to refer to that field. I set that field equal to True and then call updateRow() so that my change gets applied to the data set.

As in exercise A, I want to report the percentage of cities meeting the selection criteria, so I increment my citiesWithTwoParkAndRides variable by 1 when numSelectedParkAndRide is >= 2.

On line 44, I increment the numCities variable by 1. I want to increment this variable on each pass through the loop, not just for cities with 2 or more ParkAndRides, so note that line 44 is indented one less level than line 42 (in other words, so it’s not part of the if block).

Next, note the finally block on lines 47-50.  Here we are disposing of the objects held in the currentCityLayer and selectedParkAndRideLayer variables on each iteration of the loop since we no longer need those objects and would like to free up the memory that they take up.  

After the loop, we use the Python del statement to kill the objects held in the city and cityRows variables so we avoid retaining unnecessary locks on the data after the script completes.  

So, after doing the cleanup, now we're going to calculate the percentage of cities that have at least two park and rides. Now, you can't divide by zero, so we're doing a check in line 55 for that. But if everything checks out OK, which it would unless we had a major problem with our data set, we go ahead in line 56 and divide the number of cities with two park and rides by the total number of cities. And then, we multiply by 100 to get a nice percentage figure, which we then report in the print statement on line 57.

Credit: S. Quinn, J. Detwiler © Penn State is licensed under CC BY-NC-SA 4.0.

Here is a different solution to Practice Exercise B, which uses the alternate syntax discussed in the lesson.

# This script determines the percentage of cities with two park
#  and ride facilities

import arcpy

arcpy.env.overwriteOutput = True
arcpy.env.workspace = r"C:\PSU\geog485\L3\PracticeExerciseB\Washington.gdb"

cityBoundariesFC = "CityBoundaries"
parkAndRideFC = "ParkAndRide"
parkAndRideField = "HasTwoParkAndRides"   # Name of column for storing the Park & Ride information
cityIDStringField = "CI_FIPS"             # Name of column with city IDs
citiesWithTwoParkAndRides = 0             # Used for counting cities with at least two P & R facilities
numCities = 0                             # Used for counting cities in total
  
# Make a feature layer of all the park and ride facilities
arcpy.MakeFeatureLayer_management(parkAndRideFC, "ParkAndRideLayer")
  
# Make an update cursor and loop through each city
with arcpy.da.UpdateCursor(cityBoundariesFC, (cityIDStringField, parkAndRideField)) as cityRows:
    for city in cityRows:
        # Create a query string for the current city    
        cityIDString = city[0]
        whereClause = cityIDStringField + " = '" + cityIDString + "'"
        print("Processing city " + cityIDString)
  
        # Make a feature layer of just the current city polygon    
        arcpy.MakeFeatureLayer_management(cityBoundariesFC, "CurrentCityLayer", whereClause)
  
        try:
            # Narrow down the park and ride layer by selecting only the park and rides
            #  in the current city
            arcpy.SelectLayerByLocation_management("ParkAndRideLayer", "CONTAINED_BY", "CurrentCityLayer")
  
            # Count the number of park and ride facilities selected
            selectedParkAndRideCount = arcpy.GetCount_management("ParkAndRideLayer")
            numSelectedParkAndRide = int(selectedParkAndRideCount[0])
  
            # If more than two park and ride facilities found, update the row to TRUE
            if numSelectedParkAndRide >= 2:
                city[1] = "TRUE"
  
                # Don't forget to call updateRow
                cityRows.updateRow(city)
  
                # Add 1 to your tally of cities with two park and rides                
                citiesWithTwoParkAndRides += 1
            
            numCities += 1
        except:
            print("Problem determining number of ParkAndRides in " + cityIDString)           
        finally:
            # Clean up feature layer
            arcpy.Delete_management("CurrentCityLayer") 
 
# Clean up feature layer 
arcpy.Delete_management("ParkAndRideLayer")
del city, cityRows
  
# Calculate and report the number of cities with two park and rides
if numCities != 0:
    percentCitiesWithParkAndRide = (citiesWithTwoParkAndRides / numCities) * 100
    print (str(round(percentCitiesWithParkAndRide,1)) + " percent of cities have two park and rides.")
else:
    print ("Error with input dataset. No cities found.")

The video below offers some line-by-line commentary on the structure of the above solution:

Video: Alternate Solution to Lesson 3, Practice Exercise B (2:10)

Click here for transcript of Alternate Solution to Lesson 3, Practice Exercise B.

This video shows an alternate solution to Lesson 3, Practice Exercise B. The general outline of this script is much like the first solution. As we saw with Exercise A, this version of the script uses the MakeFeatureLayer tool to create feature layers used in the call to the SelectLayerByLocation tool rather than feature classes.

Line 17 creates a feature layer called ParkAndRideLayer which references all of the ParkAndRide points in the state.

Then I create an update cursor on the city boundaries feature class, just like in the earlier solution. In this solution, though, note that line 28 creates a feature layer (“CurrentCityLayer”) by passing whereClause as an argument to the MakeFeatureLayer tool. This means the feature layer will refer to a different city on each pass through the loop.

The two feature layers are then passed as inputs to the SelectLayerByLocation tool on line 33, finding the ParkAndRides within the current city and putting that selection into the ParkAndRideLayer.

Then unlike the first solution, which got the number of selected ParkAndRides out of the Result object returned by the SelectLayerbyLocation tool, in this solution, I use the Get Count tool to get the number of ParkAndRides.

Also, note that in the finally block, I use the Delete tool to clean up just "CurrentCityLayer", not "ParkAndRideLayer".  That's because the object held in "ParkAndRideLayer" is the same and is needed throughout all iterations of the loop, whereas the object in "CurrentCityLayer" is different on each iteration and is only needed in that iteration.  Thus, it's a good idea to dispose of it as part of the loop.

The rest of the script is then pretty much the same.

Credit: S. Quinn and J. Detwiler © Penn State is licensed under CC BY-NC-SA 4.0.

Testing these scripts on a ~2-year-old Dell laptop running Windows 10 with 16GB of RAM yielded some very different results.  In 5 trials of both scripts, the first one needed an average of 240 seconds to complete.  The second one (using the older syntax) needed only 83 seconds.  This may seem counterintuitive, though it's interesting to note that the newer syntax was the faster performing version of the scripts for the other 3 exercises (times shown in seconds):

Exercise Old New
A 0.99 0.45
B 83.67 240.33
C 1.78 0.85
D 1.46 0.92

You can check the timing of the scripts on your machine by adding the following lines just after the import arcpy statement:

import time 
process_start_time = time.time() 

and this line at the end of the script:

print ("--- %s seconds ---" % (time.time() - process_start_time))

If you're interested in learning more about testing your script's performance along with methods for determining how long it takes to execute various parts of your script (e.g., to explore why there is such a difference in performance between the two Exercise B solutions), you should consider enrolling in our GEOG 489 class