The links below provide an outline of the material for this lesson. Be sure to carefully read through the entire lesson before returning to Canvas to submit your assignments.
An essential part of a GIS is the data that represents both the geometry (locations) of geographic features and the attributes of those features. This combination of features and attributes is what makes GIS go beyond just "mapping." Much of your work as a GIS analyst involves adding, modifying, and deleting features and their attributes from the GIS.
Beyond maintaining the data, you also need to know how to query and select the data that is most important to your projects. Sometimes you'll want to query a dataset to find only the records that match a certain criteria (for example, single-family homes constructed before 1980) and calculate some statistics based on only the selected records (for example, percentage of those homes that experienced termite infestation).
All of the above tasks of maintaining, querying, and summarizing data can become tedious and error prone if performed manually. Python scripting is often a faster and more accurate way to read and write large amounts of data. There are already many pre-existing tools for data selection and management in ArcGIS Pro. Any of these can be used in a Python script. For more customized scenarios where you want to read through a table yourself and modify records one-by-one, arcpy contains special objects, called cursors, that you can use to examine each record in a table. You'll quickly see how the looping logic that you learned in Lesson 2 becomes useful when you are cycling through tables using cursors.
Using a script to work with your data introduces some other subtle advantages over manual data entry. For example, in a script, you can add checks to ensure that the data entered conforms to a certain format. You can also chain together multiple steps of selection logic that would be time-consuming to perform in ArcGIS Pro.
This lesson explains ways to read and write GIS data using Python. We'll start off by looking at how you can create and open datasets within a script. Then, we'll practice reading and writing data using both geoprocessing tools and cursor objects. Although this is most applicable to vector datasets, we'll also look at some ways you can manipulate rasters with Python. Once you're familiar with these concepts, Project 3 will give you a chance to practice what you've learned.
Lesson 3 explains how to read and manipulate both vector and raster data with Python. To complete Lesson 3, you are required to do the following:
Do items 1 - 2 (including any of the practice exercises you want to attempt) during the first week of the lesson. You will need the second week of the lesson to concentrate on the project, the quiz, and the proposal assignment.
By the end of this lesson, you should:
Before getting into the details of how to read and modify these attributes, it's helpful to review how geographic datasets are stored in ArcGIS. You need to know this so you can open datasets in your scripts, and on occasion, create new datasets.
Over the years, Esri has developed various ways of storing spatial data. They encourage you to put your data in geodatabases, which are organizational structures for storing datasets and defining relationships between those datasets. Different flavors of geodatabase are offered for storing different magnitudes of data.
A single vector dataset within a geodatabase is called a feature class. Feature classes can be optionally organized in feature datasets. Raster datasets can also be stored in geodatabases.
Although geodatabases are essential for long-term data storage and organization, it's sometimes convenient to access datasets in a "standalone" format on the local file system. Esri's shapefile is probably the most ubiquitous standalone vector data format (it even has its own Wikipedia article [2]). A shapefile actually consists of several files that work together to store vector geometries and attributes. The files all have the same root name, but use different extensions. You can zip the participating files together and easily email them or post them in a folder for download. In the Esri file browsers in ArcGIS Pro, the shapefiles just appear as one file.
Another type of standalone dataset dating back to the early days of ArcGIS is the ArcInfo coverage. Like the shapefile, the coverage consists of several files that work together. Coverages are definitely an endangered species, but you might encounter them if your organization used ArcInfo Workstation in the past.
Raster datasets are also often stored in standalone format instead of being loaded into a geodatabase. A raster dataset can be a single file, such as a JPEG or a TIFF, or, like a shapefile, it can consist of multiple files that work together.
Often in a script, you'll need to provide the path to a dataset. Knowing the syntax for specifying the path is sometimes a challenge because of the many different ways of storing data listed above. For example, below is an example of what a file geodatabase looks like if you just browse the file system of Windows Explorer. How do you specify the path to the dataset you need? This same challenge could occur with a shapefile, which, although more intuitively named, actually has three or more participating files.
The safest way to get the paths you need is to open Pro's Catalog View (which displays in the middle of the application window, unlike the Catalog Pane, which displays on the right side of the application window) and browse to the dataset. The location box along the top indicates the folder or geodatabase whose contents are being viewed. Clicking on the dropdown arrow within that box displays the location as a network path. That's the path you want. Here's what the same file geodatabase would look like in Pro's Catalog view. The circled path shows how you would refer to a feature class's geodatabase then add the feature class name. (Alternatively, you could right-click any feature class from either the Catalog View or Catalog Pane, go to Properties, then click the Source tab to access its path.)
Below is an example of how you could access the feature class in a Python script using this path. This is similar to one of the examples in Lesson 1.
import arcpy featureClass = "C:\\Data\\USA\\USA.gdb\\Cities" desc = arcpy.Describe(featureClass) spatialRef = desc.SpatialReference print (spatialRef.Name)
Remember that the backslash (\) is a reserved character in Python, so you'll need to use either the double backslash (\\) or forward slash (/) in the path. Another technique you can use for paths is the raw string, which allows you to put backslashes and other reserved characters in your string as long as you put "r" before your quotation marks.
featureClass = r"C:\Data\USA\USA.gdb\Cities" . . .
The Esri geoprocessing framework often uses the notion of a workspace to denote the folder or geodatabase where you're currently working. When you specify a workspace in your script, you don't have to list the full path to every dataset. When you run a tool, the geoprocessor sees the feature class name and assumes that it resides in the workspace you specified.
Workspaces are especially useful for batch processing, when you perform the same action on many datasets in the workspace. For example, you may want to clip all the feature classes in a folder to the boundary of your county. The workflow for this is:
Here's some code that clips each feature class in a file geodatabase to the Alabama state boundary, then places the output in a different file geodatabase. Note how the five lines of code after import arcpy correspond to the five steps listed above.
import arcpy arcpy.env.workspace = "C:\\Data\\USA\\USA.gdb" featureClassList = arcpy.ListFeatureClasses() clipFeature = "C:\\Data\\Alabama\\Alabama.gdb\\StateBoundary" for featureClass in featureClassList: arcpy.Clip_analysis(featureClass, clipFeature, "C:\\Data\\Alabama\\Alabama.gdb\\" + featureClass)
In the above example, the method arcpy.ListFeatureClasses() was the key to making the list. This method looks through a workspace and makes a Python list of each feature class in that workspace. Once you have this list, you can easily configure a for loop to act on each item.
Notice that you designated the path to the workspace using the location of the file geodatabase "C:\\Data\\USA\\USA.gdb". If you were working with shapefiles, you would just use the path to the containing folder as the workspace. You can download the USA.gdb here [3] and the Alabama.gdb here [4].
If you were working with ArcSDE, you would use the path to the .sde connection file when creating your workspace. This is a file that is created when you connect to ArcSDE in Catalog View, and is placed in your local profile directory. We won't be accessing ArcSDE data in this course, but if you do this at work, remember that you can use the location box as outlined above to help you understand the paths to datasets in ArcSDE.
Now that you know how to open a dataset, let's go a little bit deeper and start examining some individual data records. This section of the lesson discusses how to read and search data tables. These tables often provide the attributes for vector features, but they can also stand alone in some cases. The next section will cover how to write data to tables. At the end of the lesson, we'll look at rasters.
As we work with the data, it will be helpful for you to follow along, copying and pasting the example code into practice scripts. Throughout the lesson, you'll encounter exercises that you can do to practice what you just learned. You're not required to turn in these exercises; but if you complete them, you will have a greater familiarity with the code that will be helpful when you begin working on this lesson's project. It's impossible to read a book or a lesson, then sit down and write perfect code. Much of what you learn comes through trial and error and learning from mistakes. Thus, it's wise to write code often as you complete the lesson.
Before we get too deep into vector data access, it's going to be helpful to quickly review how the vector data is stored in the software. Vector features in ArcGIS feature classes (remember, including shapefiles) are stored in a table. The table has rows (records) and columns (fields).
Fields in the table store the geometry and attribute information for the features.
There are two fields in the table that you cannot delete. One of the fields (usually called Shape) contains the geometry information for the features. This includes the coordinates of each vertex in the feature and allows the feature to be drawn on the screen. The geometry is stored in binary format; if you were to see it printed on the screen, it wouldn't make any sense to you. However, you can read and work with geometries using objects that are provided with arcpy.
The other field included in every feature class is an object ID field (OBJECTID or FID). This contains a unique number, or identifier for each record that is used by ArcGIS to keep track of features. The object ID helps avoid confusion when working with data. Sometimes records have the same attributes. For example, both Los Angeles and San Francisco could have a STATE attribute of 'California,' or a USA cities dataset could contain multiple cities with the NAME attribute of 'Portland;' however, the OBJECTID field can never have the same value for two records.
The rest of the fields contain attribute information that describe the feature. These attributes are usually stored as numbers or text.
When you write a script, you'll need to provide the names of the particular fields you want to read and write. You can get a Python list of field names using arcpy.ListFields().
# Reads the fields in a feature class import arcpy featureClass = "C:\\Data\\USA\\USA.gdb\\Cities" fieldList = arcpy.ListFields(featureClass) # Loop through each field in the list and print the name for field in fieldList: print (field.name)
The above would yield a list of the fields in the Cities feature class in a file geodatabase named USA.gdb. If you ran this script in PyScripter (try it with one of your own feature classes!) you would see something like the following in the IPython Console.
OBJECTID Shape UIDENT POPCLASS NAME CAPITAL STATEABB COUNTRY
Notice the two special fields we already talked about: OBJECTID, which holds the unique identifying number for each record, and Shape, which holds the geometry for the record. Additionally, this feature class has fields that hold the name (NAME), the state (STATEABB), whether or not the city is a capital (CAPITAL), and so on.
arcpy treats the field as an object. Therefore the field has properties that describe it. That's why you can print field.name. The help reference topic Using fields and indexes [5] lists all the properties that you can read from a field. These include aliasName, length, type, scale, precision, and others.
Properties of a field are read-only, meaning that you can find out what the field properties are, but you cannot change those properties in a script using the Field object. If you wanted to change the scale and precision of a field, for instance, you would have to programmatically add a new field.
Now that you know how to traverse the table horizontally, reading the fields that are available, let's examine how to read up and down through the table records.
The arcpy module contains some objects called cursors that allow you to move through records in a table. There have been quite a few changes made to how cursors can be used over the different versions of ArcGIS, though the current best practice has been in place since version 10.1 of ArcGIS Desktop (aka ArcMap) and has carried over to ArcGIS Pro. If you find yourself working with code that was written for a pre-10.1 version of ArcMap, we suggest you check out this same page in our old ArcMap version of this course (see the content navigation links on the right side of the Drupal site). We will focus our discussion below on the current best practice in cursor usage, which you should be utilizing for scripts written in this course and in your workplace.
At version 10.1 of ArcMap, Esri released a new data access module, which offered faster performance along with more robust behavior when crashes or errors were encountered with the cursor. The module contains different cursor classes for the different operations a developer might want to perform on table records -- one for selecting, or reading, existing records; one for making changes to existing records; and one for adding new records. We'll discuss the editing cursors later, focusing our discussion now on the cursor used for reading tabular data, the search cursor.
As with all geoprocessing classes, the SearchCursor class is documented in the Help system [6]. (Be sure when searching the Help system that you choose the SearchCursor class found in the Data Access module. An older, pre-10.1 class is available through the arcpy module and appears in the Help system as an "ArcPy Function.")
The common workflow for reading data with a search cursor is as follows:
Here's a very simple example of a search cursor that reads through a point dataset of cities and prints the name of each.
# Prints the name of each city in a feature class import arcpy featureClass = "C:\\Data\\USA\\USA.gdb\\Cities" with arcpy.da.SearchCursor(featureClass,("NAME")) as cursor: for row in cursor: print (row[0])
Important points to note in this example:
Here's another example where something more complex is done with the row values. This script finds the average population for counties in a dataset. To find the average, you need to divide the total population by the number of counties. The code below loops through each record and keeps a running total of the population and the number of records counted. Once all the records have been read, only one line of division is necessary to find the average. You can get the sample data for this script here. [7]
# Finds the average population in a counties dataset import arcpy featureClass = "C:\\Data\\Pennsylvania\\Counties.shp" populationField = "POP1990" nameField = "NAME" average = 0 totalPopulation = 0 recordsCounted = 0 print("County populations:") with arcpy.da.SearchCursor(featureClass, (nameField, populationField)) as countiesCursor: for row in countiesCursor: print (row[0] + ": " + str(row[1])) totalPopulation += row[1] recordsCounted += 1 average = totalPopulation / recordsCounted print ("Average population for a county is " + str(average))
Differences between this example and the previous one:
Before moving on, you should note that cursor objects have a couple of methods that you may find helpful in traversing their associated records. To understand what these methods do, and to better understand cursors in general, it may help to visualize the attribute table with an arrow pointing at the "current row." When a cursor is first created, that arrow is pointing just above the first row in the table. When a cursor is included in a for loop, as in the above examples, each execution of the for statement moves the arrow down one row and assigns that row's values (a tuple) to the row variable. If the for statement is executed when the arrow is pointing at the last row, there is not another row to advance to and the loop will terminate. (The row variable will be left holding the last row's values.)
Imagine that you wanted to iterate through the rows of the cursor a second time. If you were to modify the Cities example above, adding a second loop immediately after the first, you'd see that the second loop would never "get off the ground" because the cursor's internal pointer is still left pointing at the last row. To deal with this problem, you could just re-create the cursor object. However, a simpler solution would be to call on the cursor's reset() method. For example:
cursor.reset()
This will cause the internal pointer (the arrow) to move just above the first row again, enabling you to loop through its rows again.
The other method supported by cursor objects is the next() method, which allows you to retrieve rows without using a for loop. Returning to the internal pointer concept, a call to the next() method moves the pointer down one row and returns the row's values (again, as a tuple). For example:
row = cursor.next()
An alternative means of iterating through all rows in a cursor is to use the next() method together with a while loop. Here is the original Cities example modified to iterate using next() and while:
# Prints the name of each city in a feature class (using next() and while) import arcpy featureClass = "C:\\Data\\USA\\USA.gdb\\Cities" with arcpy.da.SearchCursor(featureClass,("NAME")) as cursor: try: row = cursor.next() while row: print (row[0]) row = cursor.next() except StopIteration: pass
Points to note in this script:
You should find that using a for loop is usually the better approach, and in fact, you won't see the next() method even listed in the documentation of arcpy.da.SearchCursor. We're pointing out the existence of this method because a) older ArcGIS versions have cursors that can/must be traversed in this way, so you may encounter this coding pattern if looking at older scripts, and b) you may want to use the next() method if you're in a situation where you know the cursor will contain exactly one row (or you're interested in only the first row).
In each of the cursor examples discussed thus far, the cursors retrieved all rows from the specified feature class. Continue reading to see how to retrieve just a subset of rows meeting certain criteria.
(Note: In this section, we are only using short code sections to read through, rather than full code examples that can be run directly. However, as an exercise, you can adapt the code to, for instance, work with the Pennsylvania data set from the previous section by using Counties.shp and the POP1990 field.)
The previous examples used the SearchCursor object to read through each record in a dataset. You can get more specific with the search cursor by instructing it to retrieve just the subset of records whose attributes comply with some criteria, for example, "only records with a population greater than 10000" or "all records beginning with the letters P – Z."
For review, this is how you construct a search cursor to operate on every record in a dataset using the arcpy.da module:
with arcpy.da.SearchCursor(featureClass,(populationField)) as cursor:
If you want the search cursor to retrieve only a subset of the records based on some criteria, you can supply a SQL expression (a where clause) as the third argument in the constructor (the constructor is the method that creates the SearchCursor). For example:
with arcpy.da.SearchCursor(featureClass, (populationField), "POP2018 > 100000") as cursor:
The above example uses the SQL expression POP2018 > 100000 to retrieve only the records whose population is greater than 100000. SQL stands for "Structured Query Language" and is a special syntax used for querying datasets. If you've ever used a Definition Query to filter a layer's data in Pro, then you've had some exposure to these sorts of SQL queries. If SQL is new to you, please take a few minutes right now to read Write a query in the Query Builder [8] in the ArcGIS Pro Help. This topic is a simple introduction to SQL in the context of ArcGIS.
SQL expressions can contain a combination of criteria, allowing you to pinpoint a very focused subset of records. The complexity of your query is limited only by your available data. For example, you could use a SQL expression to find only states with a population density over 100 people per square mile that begin with the letter M and were settled after 1850.
Note that the SQL expression you supply for a search cursor is for attribute queries, not spatial queries. You could not use a SQL expression to select records that fall "west of the Mississippi River," or "inside the boundary of Canada" unless you had previously added and populated some attribute stating whether that condition were true (for example, REGION = 'Western' or CANADIAN = True). Later in this lesson, we'll talk about how to make spatial queries using the Select By Location geoprocessing tool.
Once you retrieve the subset of records, you can follow the same pattern of iterating through them using a for loop.
with arcpy.da.SearchCursor(featureClass, (populationField), "POP2018 > 100000") as cursor: for row in cursor: print (str(row[0]))
When you include a SQL expression in your SearchCursor constructor, you must supply it as a string. This is where things can get tricky with quotation marks since parts of the expression may also need to be quoted (specifically string values, such as a state abbreviation). The rule for writing queries in ArcGIS Pro is that string values must be enclosed in single quotes. Given that, you should enclose the overall expression in double quotes.
For example, suppose your script allows the user to enter the ID of a parcel, and you need to find it with a search cursor. Your SQL expression might look like this: " PARCEL = 'A2003KSW' ".
Handling quotation marks is simplified greatly in Pro as compared to ArcMap. In ArcMap, certain data formats require field names to be enclosed in double quotes, which when combined with string values being present in the expression, can make constructing the expression correctly quite a headache. If you find yourself needing to write ArcMap scripts that query data, check out the parallel page in the ArcMap version of this lesson (see course navigation links to the right).
As an ArcGIS Pro user, you've probably clicked the Select By Attributes button, located under the Map tab, to perform attribute queries. What we've been talking about in this part of the lesson probably reminded you of doing that sort of query, but it's important to note that opening a search cursor with a SQL expression (or where clause) as described above is not quite the same sort of operation. A search cursor is used in situations where you want to do some sort of processing of the records one by one. For example, as we saw, printing the names of cities or calculating their average population.
Some situations instead call for creating a selection on the feature class (i.e., treating the features identified by the query as a single unit). This can be done in Python scripts by invoking the Select Layer By Attribute tool (which is actually the tool that opens when you click the Select By Attributes button in Pro). The output from this tool -- referred to as a Feature Layer -- can then be used as the input to many other tools (Calculate Field, Copy Features, Delete Features, to name a few). This second tool's processing will be limited to the selection held in the input.
For an example, let's pick up on the population query used above. Let's say we wanted to do an analysis in which high-population cities were excluded. We might select those cities and then delete them:
popQuery = 'POP2018 > 100000' bigCities = arcpy.SelectLayerByAttribute_management('Cities', 'NEW_SELECTION', popQuery) arcpy.DeleteFeatures_management(bigCities)
In this snippet of code, note that the same where clause is implemented, this time to create a selection on the Cities feature class as opposed to producing a cursor of records to iterate through. The SelectLayerByAttribute tool returns a Feature Layer, which is an in-memory object that references the selected features. The object is stored in a variable, bigCities, which is then used as input to the DeleteFeatures tool. This will delete the high-population cities from the underlying Cities feature class.
Now, let's have a look at how to handle queries with spatial constraints.
Applying a SQL expression to the search cursor is only useful for attribute queries, not spatial queries. For example, you can easily open a search cursor on all counties named "Lincoln" using a SQL expression, but finding all counties that touch or include the Mississippi River requires a different approach. To get a subset of records based on a spatial criterion, you need to use the geoprocessing tool Select Layer By Location.
A few relational databases such as SQL Server expose spatial data types that can be spatially queried with SQL. Support for these spatial types in ArcGIS is still maturing, and in this course, we will assume that the way to make a spatial query is through Select Layer By Location. Since we are not using ArcSDE, this is actually true.
Suppose you want to generate a list of all states whose boundaries touch Wyoming. As we saw in the previous section with the Select Layer By Attribute tool, the Select Layer By Location tool will return a Feature Layer containing the features that meet the query criteria. One thing we didn't mention in the previous section is that a search cursor can be opened not only on feature classes, but also on feature layers. With that in mind, here is a set of steps one might take to produce a list of Wyoming's neighbors:
Below is some code that applies the above steps.
# Selects all states whose boundaries touch # a user-supplied state import arcpy # Get the US States layer, state, and state name field usaFC = r"C:\Data\USA\USA.gdb\Boundaries" state = "Wyoming" nameField = "NAME" try: whereClause = nameField + " = '" + state + "'" selectionStateLayer = arcpy.SelectLayerByAttribute_management(usaFC, 'NEW_SELECTION', whereClause) # Apply a selection to the US States layer neighborsLayer = arcpy.SelectLayerByLocation_management(usaFC, 'BOUNDARY_TOUCHES', selectionStateLayer) # Open a search cursor on the US States layer with arcpy.da.SearchCursor(neighborsLayer, (nameField)) as cursor: for row in cursor: # Print the name of all the states in the selection print (row[0]) except: print (arcpy.GetMessages()) finally: # Clean up feature layers and cursor arcpy.Delete_management(neighborsLayer) arcpy.Delete_management(selectionStateLayer) del cursor
You can choose from many spatial operators when running SelectLayerByLocation. The code above uses "BOUNDARY_TOUCHES". Other available relationships are "INTERSECT", "WITHIN A DISTANCE" (may save you a buffering step), "CONTAINS", "CONTAINED_BY", and others.
Note that the Row object "row" returns only one field ("NAME"), which is accessed using its index position in the list of fields. Since there's only one field, that index is 0, and the syntax looks like this: row[0]. Once you open the search cursor on your selected records, you can perform whatever action you want on them. The code above just prints the state name, but more likely you'll want to summarize or update attribute values. You'll learn how to write attribute values later in this lesson.
Notice that the feature layers are deleted using the Delete tool. This is because feature layers can maintain locks on your data, preventing other applications from using the data until your script is done. arcpy is supposed to clean up feature layers at the end of the script, but it's a good idea to delete them yourself in case this doesn't happen or in case there is a crash. In the examples above, the except block will catch a crash, then the script will continue and Delete the two feature layers.
Cursors can also maintain locks on data. As mentioned earlier, the "with" statement should clean up the cursor for you automatically. However, we've found that it doesn't always, an observation that appears to be backed up by this blurb from Esri's documentation of the arcpy.da.SearchCursor class [6]:
Search cursors also support with statements to reset iteration and aid in removal of locks. However, using a del statement to delete the object or wrapping the cursor in a function to have the cursor object go out of scope should be considered to guard against all locking cases.
One last point to note about this code that cleans up the feature layers and cursor is that it is embedded within a finally block. This is a construct that is used occasionally with try and except to define code that should be executed regardless of whether the statements in the try block run successfully. To understand the usefulness of finally, imagine if you had instead placed these cleanup statements at the end of the try block. If an error were to occur somewhere above that point in the try block -- not hard to imagine, right? -- the remainder of the try block would not be executed, leaving the feature layers and cursor in memory. A subsequent run of the script, after fixing the error, would encounter a new problem: the script would be unable to create the feature layer stored in selectionStatesLayer because it already exists. In other words, the cleanup statements would only run if the rest of the script ran successfully.
This situation is where the finally statement is especially helpful. Code in a finally block will be run regardless of whether something in the try block triggers a crash. (In the event that an error is encountered, the finally code will be executed after the except code.) Thus, as you develop your own code utilizing feature layers and/or cursors, it's a good idea to include these cleanup statements in a finally block.
Implementation of the Select Layer By Attribute/Location tools required a different syntax in earlier versions of ArcGIS. The tools didn't return a Feature Layer, and instead they required you to first create the Feature Layer yourself using the Make Feature Layer tool. Let's have a look at this same example completed using that syntax:
# Selects all states whose boundaries touch # a user-supplied state import arcpy # Get the US States layer, state, and state name field usaFC = r"C:\Data\USA\USA.gdb\Boundaries" state = "Wyoming" nameField = "NAME" try: # Make a feature layer with all the US States arcpy.MakeFeatureLayer_management(usaFC, "AllStatesLayer") whereClause = nameField + " = '" + state + "'" # Make a feature layer containing only the state of interest arcpy.MakeFeatureLayer_management(usaFC, "SelectionStateLayer", whereClause) # Apply a selection to the US States layer arcpy.SelectLayerByLocation_management("AllStatesLayer","BOUNDARY_TOUCHES","SelectionStateLayer") # Open a search cursor on the US States layer with arcpy.da.SearchCursor("AllStatesLayer", (nameField)) as cursor: for row in cursor: # Print the name of all the states in the selection print (row[0]) except: print (arcpy.GetMessages()) finally: # Clean up feature layers and cursor arcpy.Delete_management("AllStatesLayer") arcpy.Delete_management("SelectionStateLayer") del cursor
Note that the first MakeFeatureLayer statement takes the Boundaries feature class as an input and produces as an output a feature layer that throughout the rest of the script can be referred to using the name 'AllStatesLayer.' Similarly, the next statement creates another feature layer from the Boundaries feature class, this one applying a where clause to limit the included features to just Wyoming. This feature layer will go by the name 'SelectionStateLayer.' Creation of these feature layers was necessary in this older syntax because the Select Layer By Attribute/Location tools would only recognize feature layers, not feature classes, as valid inputs.
While we find this syntax to be a bit odd and less intuitive than the one shown at the beginning of this section, it still works, and you may encounter it when viewing others' scripts.
Before you move on, examine the following tool reference pages. Pay particular attention to the Usage and the Code Sample sections.
In the same way that you use cursors to read vector attribute data, you use cursors to write data as well. Two types of cursors are supplied for writing data:
In the following sections, you'll learn about both of these cursors and get some tips for using them.
The ArcGIS Pro Help has some explanation of cursors. Get familiar with this help now, as it will prepare you for the next sections of the lesson. You'll also find it helpful to return to the code examples while working on Project 3:
Accessing data using cursors [12]
Also follow the three links in the table at the beginning of the above topic. These briefly explain the InsertCursor [13], SearchCursor [6], and UpdateCursor [14] and provide a code example for each. You've already worked with SearchCursor, but closely examine the code examples for all three cursor types and see if you can determine what is happening in each.
Use the update cursor to modify existing records in a dataset. Here are the general steps for using the update cursor:
When you create an UpdateCursor and iterate through the rows using a variable called row, you can modify the field values by making assignments using the syntax row[<index of the field you want to change>] = <the new value>. For example:
row[0] = "Trisha Stevens"
It is important to note that the index occurring in the [...] to determine which field will be changed is given with respect to the tuple of fields provided when the UpdateCursor is created. For instance, if we create the cursor using the following command
with arcpy.da.UpdateCursor(featureClass, ("CompanyName", "Owner")) as cursor:
row[0] would refer to the field called "CompanyName" and row[1] refers to the field that has the name "Owner".
The script below performs a "search and replace" operation on an attribute table. For example, suppose you have a dataset representing local businesses, including banks. One of the banks was recently bought out by another bank. You need to find every instance of the old bank name and replace it with the new name. This script could perform that task automatically.
#Simple search and replace script import arcpy # Retrieve input parameters: the feature class, the field affected by # the search and replace, the search term, and the replace term. fc = arcpy.GetParameterAsText(0) affectedField = arcpy.GetParameterAsText(1) oldValue = arcpy.GetParameterAsText(2) newValue = arcpy.GetParameterAsText(3) # Create the SQL expression for the update cursor. Here this is # done on a separate line for readability. queryString = affectedField + " = '" + oldValue + "'" # Create the update cursor and update each row returned by the SQL expression with arcpy.da.UpdateCursor(fc, (affectedField), queryString) as cursor: for row in cursor: row[0] = newValue cursor.updateRow(row) del row, cursor
Notice that this script is relatively flexible because it gets all the parameters as text. However, this script can only be run on string variables because of the way the query string is set up. Notice that the old value is put in quotes, like this: "'" + oldValue + "'". Handling other types of variables, such as integers, would have made the example longer.
Again, it is critical to understand the tuple of affected fields that you pass in when you create the update cursor. In this example, there is only one affected field (which we named affectedField), so its index position is 0 in the tuple. Therefore, you set that field value using row[0] = newValue.
The last line with updateRow(...) is needed to make sure that the modified row is actually written back to the attribute table. Please note that the variable row needs to be passed as a parameter to updateRow(...).
As we mentioned, ArcGIS sometimes places locks on datasets to avoid the possibility of editing conflicts between two users. If you think for any reason that a lock from your script is affecting your dataset (by preventing you from viewing it, making it look like all rows have been deleted, and so on), you must close PyScripter to remove the lock. If you think that Pro has a lock on your data, check to see if there is an open edit session on the data, the data is being displayed in the Catalog View/Pane, or if a layer based on the data is part of an open map.
As we stated, cursor cleanup should happen through the creation of the cursor inside a "with" statement, but adding lines to delete the row and cursor objects will make extra sure locks are released.
For the Esri explanation of how locking works, you can review the section "Cursors and locking" in the topic Accessing data using cursors [15] in the ArcGIS Pro Help.
When adding a new record to a table, you must use the insert cursor. Here's the workflow for insert cursors:
As with the search and update cursor, you can use an insert cursor together with the "with" statement to avoid locking problems.
Insert cursors differ from search and update cursors in that you cannot provide an SQL expression when you create the insert cursor. This makes sense because an insert cursor is only concerned with adding records to the table. It does not need to "know" about the existing records or any subset thereof.
When you insert a row using InsertCursor.insertRow(), you provide a comma-delimited tuple of values for the fields of the new row. The order of these values must match the order of values of the tuple of affected fields you provided when you created the cursor. For example, if you create the cursor using
with arcpy.da.InsertCursor(featureClass, ("FirstName","LastName")) as cursor:
you would add a new row with values "Sam" for "FirstName" and "Fisher" for "LastName" by the following command:
cursor.insertRow(("Sam","Fisher"))
Please note that the inner parentheses are needed to turn the values into a tuple that is passed to insertRow(). Writing cursor.insertRow("Sam","Fisher") would have resulted in an error.
The example below uses an insert cursor to create one new point in the dataset and assign it one attribute: a string description. This script could potentially be used behind a public-facing 311 [16] application, in which members of the public can click a point on a Web map and type a description of an incident that needs to be resolved by the municipality, such as a broken streetlight.
# Adds a point and an accompanying description import arcpy # Retrieve input parameters inX = arcpy.GetParameterAsText(0) inY = arcpy.GetParameterAsText(1) inDescription = arcpy.GetParameterAsText(2) # These parameters are hard-coded. User can't change them. incidentsFC = "C:/Data/Yakima/Incidents.shp" descriptionField = "DESCR" # Make a tuple of fields to update fieldsToUpdate = ("SHAPE@XY", descriptionField) # Create the insert cursor with arcpy.da.InsertCursor(incidentsFC, fieldsToUpdate) as cursor: # Insert the row providing a tuple of affected attributes cursor.insertRow(((float(inX),float(inY)), inDescription)) del cursor
Take a moment to ensure that you know exactly how the following is done in the code:
If this script really were powering an interactive 311 application, the X and Y values could be derived from a point a user clicked on the Web map rather than as parameters as done in lines 4 and 5.
One thing you might have noticed is that the string "SHAPE@XY" is used to specify the Shape field. You might expect that this would just be "Shape," but arcpy.da provides a list of "tokens" that you can use if the field will be specified in a certain way. In our case, it would be very convenient just to provide the X and Y values of the points using a tuple of coordinates. It turns out that the token "SHAPE@XY" allows you to do just that. See the documentation of the InsertCursor's field_names parameter [13] to learn about other tokens you can use.
Putting this all together, the example creates a tuple of affected fields: ("SHAPE@XY", "DESCR"). Notice that in line 13, we actually use the variable descriptionField which contains the name of the second column "DESCR" for the second element of the tuple. Using a variable to store the name of the column we are interested in allows us to easily adapt the script later, for instance to a data set where the column has a different name. When the row is inserted, the values for these items are provided in the same order: cursor.insertRow(((float(inX), float(inY)), inDescription)). The argument passed to insertRow() is a tuple that contains another tuple (inX,inY), namely the coordinates of the point cast to floating point numbers, as the first element and the text for the "DESCR" field as the second element.
Take a few minutes to read Zandbergen 8.1 - 8.3 to reinforce your learning about cursors.
So far in this lesson, your scripts have only read and edited vector datasets. This work largely consists of cycling through tables of records and reading and writing values to certain fields. Raster data is very different, and consists only of a series of cells, each with its own value. So, how do you access and manipulate raster data using Python?
It's unlikely that you will ever need to cycle through a raster cell by cell on your own using Python, and that technique is outside the scope of this course. Instead, you'll most often use predefined tools to read and manipulate rasters. These tools have been designed to operate on various types of rasters and perform the cell-by-cell computations so that you don't have to.
In ArcGIS, most of the tools you'll use when working with rasters are in either the Data Management > Raster toolset or the Spatial Analyst toolbox. These tools can reproject, clip, mosaic, and reclassify rasters. They can calculate slope, hillshade, and aspect rasters from DEMs.
The Spatial Analyst toolbox also contains tools for performing map algebra on rasters. Multiplying or adding many rasters together using map algebra is important for GIS site selection scenarios. For example, you may be trying to find the best location for a new restaurant and you have seven criteria that must be met. If you can create a boolean raster (containing 1 for suitable, 0 for unsuitable) for each criterion, you can use map algebra to multiply the rasters and determine which cells receive a score of 1, meeting all the criteria. (Alternatively, you could add the rasters together and determine which areas received a value of 7.) Other courses in the Penn State GIS certificate program walk through these types of scenarios in more detail.
The tricky part of map algebra is constructing the expression, which is a string stating what the map algebra operation is supposed to do. ArcGIS Pro contains interfaces for constructing an expression for one-time runs of the tool. But what if you want to run the analysis several times, or with different datasets? It's challenging even in ModelBuilder to build a flexible expression into the map algebra tools. With Python, you can manipulate the expression as much as you need.
Examine the following example, which takes in a minimum and maximum elevation value as parameters, then does some map algebra with those values. The expression isolates areas where the elevation is greater than the minimum parameter and less than the maximum parameter. Cells that satisfy the expression are given a value of 1 by the software, and cells that do not satisfy the expression are given a value of 0.
But what if you don't want those 0 values cluttering your raster? This script gets rid of the 0's by running the Reclassify tool with a real simple remap table stating that input raster values of 1 should remain 1. Because 0 is left out of the remap table, it gets reclassified as NoData:
# This script takes a DEM, a minimum elevation, # and a maximum elevation. It outputs a new # raster showing only areas that fall between # the min and the max import arcpy from arcpy.sa import * arcpy.env.overwriteOutput = True arcpy.env.workspace = "C:/Data/Elevation" # Get parameters of min and max elevations inMin = arcpy.GetParameterAsText(0) inMax = arcpy.GetParameterAsText(1) arcpy.CheckOutExtension("Spatial") # Perform the map algebra and make a temporary raster inDem = Raster("foxlake") tempRaster = (inDem > int(inMin)) & (inDem < int(inMax)) # Set up remap table and call Reclassify, leaving all values not 1 as NODATA remap = RemapValue([[1,1]]) remappedRaster = Reclassify(tempRaster, "Value", remap, "NODATA") # Save the reclassified raster to disk remappedRaster.save("foxlake_recl") arcpy.CheckInExtension("Spatial")
Read the example above carefully, as many times as necessary, for you to understand what is occurring in each line. Notice the following things:
The above example script doesn't use any file extensions for the rasters. This is because the rasters use the Esri GRID format, which doesn't use extensions. If you have rasters in another format, such as .jpg, you will need to add the correct file extension. If you're unsure of the syntax to use when providing a raster file name, double click the raster in Catalog View and note its Location in the pop-out box.
If you look at rasters such as an Esri GRID in Windows Explorer, you may see that they actually consist of several supporting files with different extensions, sometimes even contained in a series of folders. Don't try to guess one of the files to reference; instead, use Catalog View to get the path to the raster. When you use this path, the supporting files and folders will work together automatically.
Zandbergen chapter 10 covers a lot of additional functions you can perform with rasters and has some good code examples. You don't have to understand everything in this chapter, but it might give you some good ideas for your final project.
Lessons 3 and 4 contain practice exercises that are longer than the previous practice exercises and are designed to prepare you specifically for the projects. You should make your best attempt at each practice exercise before looking at the solution. If you get stuck, study the solution until you understand it.
Don't spend so much time on the practice exercises that you neglect Project 3. However, successfully completing the practice exercises will make Project 3 much easier and quicker for you.
The data for the Lesson 3 practice exercises is very simple and, like some of the Project 2 practice exercise data, was derived from Washington State Department of Transportation [18] datasets. Download the data here [19].
Using the discussion forums is a great way to work towards figuring out the practice exercises. You are welcome to post blocks of code on the forums relating to these exercises.
When completing the actual Project 3, avoid posting blocks of code longer than a few lines. If you have a question about your Project 3 code, please email the instructor, or you can post general questions to the forums that don't contain more than a few lines of code.
If the practice exercises look daunting to you, you might start by practicing with your cursors a little bit using the sample data:
You can post thoughts on the above challenges on the forums.
In this practice exercise, you will programmatically select features by location and update a field for the selected features. You'll also use your selection to perform a calculation.
In your Lesson3PracticeExerciseA folder, you have a Washington geodatabase with two feature classes:
You want to find out which cities contain park and ride facilities and what percentage of cities have at least one facility.
You do not have to make a script tool for this assignment. You can hard-code the variable values. Try to group the hard-coded string variables at the beginning of the script.
For the purposes of these practice exercises, assume that each point in the ParkAndRide dataset represents one valid park and ride (ignore the value in the TYPE field).
You can jump into the assignment at this point, or read the following tips to give you some guidance.
Below is one possible solution to Practice Exercise A with comments to explain what is going on. If you find a more efficient way to code a solution, please share it through the discussion forums. Please note that in order to make the changes to citiesLayer permanent, you have to write the layer back to disk using the arcpy.CopyFeatures_management(...) function. This is not shown in the solution here.
# This script determines the percentage of cities in the # state with park and ride facilities import arcpy arcpy.env.overwriteOutput = True arcpy.env.workspace = r"C:\PSU\geog485\L3\PracticeExerciseA\Washington.gdb" cityBoundariesFC = "CityBoundaries" parkAndRideFC = "ParkAndRide" parkAndRideField = "HasParkAndRide" # Name of column with Park & Ride information citiesWithParkAndRide = 0 # Used for counting cities with Park & Ride try: # Narrow down the cities layer to only the cities that contain a park and ride citiesLayer = arcpy.SelectLayerByLocation_management(cityBoundariesFC, "CONTAINS", parkAndRideFC) # Create an update cursor and loop through the selected records with arcpy.da.UpdateCursor(citiesLayer, (parkAndRideField)) as cursor: for row in cursor: # Set the park and ride field to TRUE and keep a tally row[0] = "True" cursor.updateRow(row) citiesWithParkAndRide += 1 except: print ("There was a problem performing the spatial selection or updating the cities feature class") # Delete the feature layers even if there is an exception (error) raised finally: arcpy.Delete_management(citiesLayer) del row, cursor # Count the total number of cities (this tool saves you a loop) numCitiesCount = arcpy.GetCount_management(cityBoundariesFC) numCities = int(numCitiesCount[0]) # Get the number of cities in the feature layer #citiesWithParkAndRide = int(citiesLayer[2]) # Calculate the percentage and print it for the user percentCitiesWithParkAndRide = (citiesWithParkAndRide / numCities) * 100 print (str(round(percentCitiesWithParkAndRide,1)) + " percent of cities have a park and ride.")
Below is a video offering some line-by-line commentary on the structure of this solution:
Here is a different solution to Practice Exercise A, which uses the alternate syntax discussed in the lesson:
# This script determines the percentage of cities in the # state with park and ride facilities import arcpy arcpy.env.overwriteOutput = True arcpy.env.workspace = r"C:\PSU\geog485\L3\PracticeExerciseA\Washington.gdb" cityBoundariesFC = "CityBoundaries" parkAndRideFC = "ParkAndRide" parkAndRideField = "HasParkAndRide" # Name of column with Park & Ride information citiesWithParkAndRide = 0 # Used for counting cities with Park & Ride try: # Make a feature layer of all the park and ride facilities arcpy.MakeFeatureLayer_management(parkAndRideFC, "ParkAndRideLayer") # Make a feature layer of all the cities polygons arcpy.MakeFeatureLayer_management(cityBoundariesFC, "CitiesLayer") except: print ("Could not create feature layers") try: # Narrow down the cities layer to only the cities that contain a park and ride arcpy.SelectLayerByLocation_management("CitiesLayer", "CONTAINS", "ParkAndRideLayer") # Create an update cursor and loop through the selected records with arcpy.da.UpdateCursor("CitiesLayer", (parkAndRideField)) as cursor: for row in cursor: # Set the park and ride field to TRUE and keep a tally row[0] = "True" cursor.updateRow(row) citiesWithParkAndRide +=1 except: print ("There was a problem performing the spatial selection or updating the cities feature class") # Delete the feature layers even if there is an exception (error) raised finally: arcpy.Delete_management("ParkAndRideLayer") arcpy.Delete_management("CitiesLayer") del row, cursor # Count the total number of cities (this tool saves you a loop) numCitiesCount = arcpy.GetCount_management(cityBoundariesFC) numCities = int(numCitiesCount[0]) # Calculate the percentage and print it for the user percentCitiesWithParkAndRide = (citiesWithParkAndRide / numCities) * 100 print (str(round(percentCitiesWithParkAndRide,1)) + " percent of cities have a park and ride.")
Below is a video offering some line-by-line commentary on the structure of this solution:
If you look in your Lesson3PracticeExerciseB folder, you'll notice the data is exactly the same as for Practice Exercise A...except, this time the field is "HasTwoParkAndRides."
In Practice Exercise B, your assignment is to find which cities have at least two park and rides within their boundaries.
This simple modification in requirements is a game changer. The following is one way you can approach the task. Notice that it is very different from what you did in Practice Exercise A:
Below is one possible solution to Practice Exercise B with comments to explain what is going on. If you find a more efficient way to code a solution, please share it through the discussion forums.
# This script determines the percentage of cities with two park # and ride facilities import arcpy arcpy.env.overwriteOutput = True arcpy.env.workspace = r"C:\PSU\geog485\L3\PracticeExerciseB\Washington.gdb" cityBoundariesFC = "CityBoundaries" parkAndRideFC = "ParkAndRide" parkAndRideField = "HasTwoParkAndRides" # Name of column for storing the Park & Ride information cityIDStringField = "CI_FIPS" # Name of column with city IDs citiesWithTwoParkAndRides = 0 # Used for counting cities with at least two P & R facilities numCities = 0 # Used for counting cities in total # Make an update cursor and loop through each city with arcpy.da.UpdateCursor(cityBoundariesFC, (cityIDStringField, parkAndRideField)) as cityRows: for city in cityRows: # Create a query string for the current city cityIDString = city[0] whereClause = cityIDStringField + " = '" + cityIDString + "'" print("Processing city " + cityIDString) # Make a feature layer of just the current city polygon currentCityLayer = arcpy.SelectLayerByAttribute_management(cityBoundariesFC, "NEW_SELECTION", whereClause) try: # Narrow down the park and ride layer by selecting only the park and rides in the current city selectedParkAndRideLayer = arcpy.SelectLayerByLocation_management(parkAndRideFC, "CONTAINED_BY", currentCityLayer) # Count the number of park and ride facilities selected numSelectedParkAndRide = int(selectedParkAndRideLayer[2]) # If more than two park and ride facilities found, update the row to TRUE if numSelectedParkAndRide >= 2: city[1] = "TRUE" # Don't forget to call updateRow cityRows.updateRow(city) # Add 1 to your tally of cities with two park and rides citiesWithTwoParkAndRides += 1 numCities += 1 except: print("Problem determining number of ParkAndRides in " + cityIDString) finally: # Clean up feature layers arcpy.Delete_management(selectedParkAndRideLayer) arcpy.Delete_management(currentCityLayer) del city, cityRows # Calculate and report the number of cities with two park and rides if numCities != 0: percentCitiesWithParkAndRide = (citiesWithTwoParkAndRides / numCities) * 100 print (str(round(percentCitiesWithParkAndRide,1)) + " percent of cities have two park and rides.") else: print ("Error with input dataset. No cities found.")
The video below offers some line-by-line commentary on the structure of the above solution:
Here is a different solution to Practice Exercise B, which uses the alternate syntax discussed in the lesson.
# This script determines the percentage of cities with two park # and ride facilities import arcpy arcpy.env.overwriteOutput = True arcpy.env.workspace = r"C:\PSU\geog485\L3\PracticeExerciseB\Washington.gdb" cityBoundariesFC = "CityBoundaries" parkAndRideFC = "ParkAndRide" parkAndRideField = "HasTwoParkAndRides" # Name of column for storing the Park & Ride information cityIDStringField = "CI_FIPS" # Name of column with city IDs citiesWithTwoParkAndRides = 0 # Used for counting cities with at least two P & R facilities numCities = 0 # Used for counting cities in total # Make a feature layer of all the park and ride facilities arcpy.MakeFeatureLayer_management(parkAndRideFC, "ParkAndRideLayer") # Make an update cursor and loop through each city with arcpy.da.UpdateCursor(cityBoundariesFC, (cityIDStringField, parkAndRideField)) as cityRows: for city in cityRows: # Create a query string for the current city cityIDString = city[0] whereClause = cityIDStringField + " = '" + cityIDString + "'" print("Processing city " + cityIDString) # Make a feature layer of just the current city polygon arcpy.MakeFeatureLayer_management(cityBoundariesFC, "CurrentCityLayer", whereClause) try: # Narrow down the park and ride layer by selecting only the park and rides # in the current city arcpy.SelectLayerByLocation_management("ParkAndRideLayer", "CONTAINED_BY", "CurrentCityLayer") # Count the number of park and ride facilities selected selectedParkAndRideCount = arcpy.GetCount_management("ParkAndRideLayer") numSelectedParkAndRide = int(selectedParkAndRideCount[0]) # If more than two park and ride facilities found, update the row to TRUE if numSelectedParkAndRide >= 2: city[1] = "TRUE" # Don't forget to call updateRow cityRows.updateRow(city) # Add 1 to your tally of cities with two park and rides citiesWithTwoParkAndRides += 1 numCities += 1 except: print("Problem determining number of ParkAndRides in " + cityIDString) finally: # Clean up feature layer arcpy.Delete_management("CurrentCityLayer") # Clean up feature layer arcpy.Delete_management("ParkAndRideLayer") del city, cityRows # Calculate and report the number of cities with two park and rides if numCities != 0: percentCitiesWithParkAndRide = (citiesWithTwoParkAndRides / numCities) * 100 print (str(round(percentCitiesWithParkAndRide,1)) + " percent of cities have two park and rides.") else: print ("Error with input dataset. No cities found.")
The video below offers some line-by-line commentary on the structure of the above solution:
Testing these scripts on a ~2-year-old Dell laptop running Windows 10 with 16GB of RAM yielded some very different results. In 5 trials of both scripts, the first one needed an average of 240 seconds to complete. The second one (using the older syntax) needed only 83 seconds. This may seem counterintuitive, though it's interesting to note that the newer syntax was the faster performing version of the scripts for the other 3 exercises (times shown in seconds):
Exercise | Old | New |
A | 0.99 | 0.45 |
B | 83.67 | 240.33 |
C | 1.78 | 0.85 |
D | 1.46 | 0.92 |
You can check the timing of the scripts on your machine by adding the following lines just after the import arcpy statement:
import time process_start_time = time.time()
and this line at the end of the script:
print ("--- %s seconds ---" % (time.time() - process_start_time))
If you're interested in learning more about testing your script's performance along with methods for determining how long it takes to execute various parts of your script (e.g., to explore why there is such a difference in performance between the two Exercise B solutions), you should consider enrolling in our GEOG 489 class [22].
This practice exercise uses the same starting data as Lesson 3 Practice Exercise A. It is designed to give you practice with extracting data based on an attribute query.
Select all park and ride facilities with a capacity of more than 500 parking spaces and put them into their own feature class. The capacity of each park and ride is stored in the "Approx_Par" field.
Use the SelectLayerByAttribute_management tool to perform the selection. You will set up a SQL expression and pass it in as the third parameter for this tool.
Once you make the selection, use the Copy Features [23] tool to create the new feature class. You can pass a feature layer directly into the Copy Features tool, and it will make a new feature class from all features in the selection.
Remember to delete your feature layers when you are done.
Below is one approach to Lesson 3 Practice Exercise C. The number of spaces to query is stored in a variable at the top of the script, allowing for easy testing with other values.
# Selects park and ride facilities with over a certain number of parking spots # and exports them to a new feature class using CopyFeatures import arcpy parkingSpaces = 500 arcpy.env.workspace = r"C:\PSU\geog485\L3\PracticeExerciseC\Washington.gdb" arcpy.env.overwriteOutput = True # Set up the SQL expression to query the parking capacity parkingQuery = "Approx_Par > " + str(parkingSpaces) # Select the park and rides that applies the SQL expression parkAndRideLayer = arcpy.SelectLayerByAttribute_management("ParkAndRide", "NEW_SELECTION", parkingQuery) # Copy the features to a new feature class and clean up arcpy.CopyFeatures_management(parkAndRideLayer, "BigParkAndRideFacilities") arcpy.Delete_management(parkAndRideLayer)
The video below offers some line-by-line commentary on the structure of the above solution:
Below is an alternate approach to the exercise.
# Selects park and ride facilities with over a certain number of parking spots # and exports them to a new feature class using CopyFeatures import arcpy parkingSpaces = 500 arcpy.env.workspace = r"C:\PSU\geog485\L3\PracticeExerciseC\Washington.gdb" arcpy.env.overwriteOutput = True # Set up the SQL expression to query the parking capacity parkingQuery = "Approx_Par > " + str(parkingSpaces) # Make a feature layer of park and rides that applies the SQL expression arcpy.MakeFeatureLayer_management("ParkAndRide", "ParkAndRideLayer", parkingQuery) # Copy the features to a new feature class and clean up arcpy.CopyFeatures_management("ParkAndRideLayer", "BigParkAndRideFacilities") arcpy.Delete_management("ParkAndRideLayer")
The video below offers some line-by-line commentary on the structure of the above solution:
This practice exercise requires applying both an attribute selection and a spatial selection. It is directly applicable to Project 3 in many ways. The data is the same that you used in exercises A and C.
Write a script that selects all the park and ride facilities in a given city and saves them out to a new feature class. You can test with the city of 'Federal Way'.
Start by making a feature layer from the CityBoundaries feature class that contains just the city in question. You'll then need to make a feature layer from the park and ride feature class and perform a spatial selection on it using the "WITHIN" operation. Then use the Copy Features tool as in the previous lesson to move the selected park and ride facilities into their own feature class.
Below is one possible approach to Lesson 3 Practice Exercise D. Notice that the city name is stored near the top of the script in a variable so that it can be tested with other values.
# Selects park and ride facilities in a given target city and # exports them to a new feature class import arcpy targetCity = "Federal Way" # Name of target city arcpy.env.workspace = r"C:\PSU\geog485\L3\PracticeExerciseD\Washington.gdb" arcpy.env.overwriteOutput = True parkAndRideFC = "ParkAndRide" # Name of P & R feature class citiesFC = "CityBoundaries" # Name of city feature class # Set up the SQL expression of the query for the target city cityQuery = "NAME = '" + targetCity + "'" # Select just the target city cityLayer = arcpy.SelectLayerByAttribute_management(citiesFC, "NEW_SELECTION", cityQuery) # Select all park and rides in the target city parkAndRideLayer = arcpy.SelectLayerByLocation_management(parkAndRideFC, "CONTAINED_BY", cityLayer) # Copy the features to a new feature class and clean up arcpy.CopyFeatures_management(parkAndRideLayer, "TargetParkAndRideFacilities") arcpy.Delete_management(parkAndRideLayer) arcpy.Delete_management(cityLayer)
See the video below for some line-by-line commentary on the above solution:
Here is an alternate solution for this exercise:
# Selects park and ride facilities in a given target city and # exports them to a new feature class import arcpy targetCity = "Federal Way" # Name of target city arcpy.env.workspace = r"C:\PSU\geog485\L3\PracticeExerciseD\Washington.gdb" arcpy.env.overwriteOutput = True parkAndRideFC = "ParkAndRide" # Name of P & R feature class citiesFC = "CityBoundaries" # Name of city feature class # Set up the SQL expression of the query for the target city cityQuery = "NAME = '" + targetCity + "'" # Make feature layers for the target city and park and rides arcpy.MakeFeatureLayer_management(citiesFC, "CityLayer", cityQuery) arcpy.MakeFeatureLayer_management(parkAndRideFC, "ParkAndRideLayer") # Select all park and rides in the target city arcpy.SelectLayerByLocation_management("ParkAndRideLayer", "CONTAINED_BY", "CityLayer") # Copy the features to a new feature class and clean up arcpy.CopyFeatures_management("ParkAndRideLayer", "TargetParkAndRideFacilities") arcpy.Delete_management("ParkAndRideLayer") arcpy.Delete_management("CityLayer")
See the video below for some line-by-line commentary on the above solution:
In this project, you'll use your new skills working with selections and cursors to process some data from a "raw" format into a more specialized dataset for a specific mapping purpose. The data from this exercise was retrieved from the National Hockey League's undocumented statistics API [24].
Download the data for this project [25]
In this exercise, suppose you are a data analyst for an NHL franchise. In preparation for the league draft, the team general manager has asked you to make it possible for him to retrieve all current players born in a particular country (say, Sweden) broken down by position. Your predecessor passed along to you the player shapefile, but unfortunately the player's birth country is not included as one of the attributes. You do have a second shapefile of world country boundaries, though...
Write a script that makes a separate shapefile for each of the three forward positions (center, right wing, and left wing) within the boundary of Sweden. Write this script so that the user can change the country or list of positions simply by editing a couple of lines of code at the top of the script.
In browsing the attribute table, you'll note that player heights and weights are stored in imperial units (feet & inches and pounds). As part of this extraction task, to simplify comparisons against scouting reports written using metric units, you should also add two new numeric fields to the attribute table -- to the new shapefiles only, not the original nhlrosters.shp -- to hold height and weight values in centimeters and kilograms. For every record, populate these fields with values based on the following formulas:
height_cm = height (in inches) * 2.54
weight_kg = weight (in pounds) * 0.453592
Your result should look something like the figure below if viewed in Pro. The custom symbolization and labels are not required to be part of your script.
Figure 3.5 Example output from Project 3, viewed in Pro.
The above requirements are sufficient for receiving 90% of the credit on this assignment. The remaining 10% is reserved for "Over and above" efforts, such as making a script tool, or extending the script to handle multiple target countries, other combinations of fields and queries, etc. For these over and above efforts, we prefer that you submit two copies of the script: one with the basic functionality and one with the extended functionality. This will make it more likely that you'll receive the base credit if something fails with your over and above coding.
Deliverables for this project are as follows:
You do not have to create a script tool for this assignment; you can hard-code the initial parameters. Nevertheless, put all the parameters at the top so they can be easily manipulated by whoever tests the script.
Once you get everything working, creating a script tool is a good way to achieve the "over and above" credit for this assignment. If you do this, then please zip all supporting files before placing them in the drop box.
Take a look at the provided datasets in Pro, particularly the attribute tables. The nhlrosters shapefile contains a field called "position" that provides the values you need to examine (RW = Right Wing, LW = Left Wing, C = Center, D = Defenseman, G = Goaltender). As mentioned, you've been asked to extract the forward positions (RW, LW, and C) to new shapefiles, but you want to write a script that's capable of handling some other combination of positions, too. There are several other fields that offer the potential for interesting queries as well, if you're looking for over and above ideas.
The Countries_WGS84 shapefile has a field called "CNTRY_NAME". You can make an attribute selection on this field to select Sweden, then follow that up with a spatial selection to grab all the players that fall within this country. Finally, narrow down those players to just the ones that play the desired position.
Once you've selected Swedish players at the desired position, use the Copy Features [23] tool to save the selected features into a new feature class, in the same fashion as in the practice exercises.
Take this project one step at a time. It's probably easiest to tackle the extraction-into-shapefile portion first. Once you have all the new position shapefiles created, go through them one by one, use the "Add Field" tool to add the "height_cm" and "weight_kg" fields, followed by an UpdateCursor to loop through all rows and populate these new fields with appropriate values.
It might be easiest to get the whole process working with a single position, then add the loop for all the positions later after you have finalized all the other script logic.
The height field is of type Text to allow for the ' and " characters. The string slicing notation covered earlier in the course can be used to obtain the feet and inches components of the player height. Use the inches to centimeters formula shown above to compute the height in metric units.
For the purposes of this exercise, don't worry about capturing points that fall barely outside the edge of the boundary (e.g., points in coastal cities that appear in the ocean). To capture all these in real life, you would just need to obtain a higher resolution boundary file. The code would be the same.
You should be able to complete this exercise in about 50 lines of code (including whitespace and comments). If your code gets much longer than this, you are probably missing an easier way.
Links
[1] https://pro.arcgis.com/en/pro-app/latest/help/mapping/layer-properties/what-is-a-query-layer-.htm
[2] http://en.wikipedia.org/wiki/Shapefile
[3] https://www.e-education.psu.edu/geog485/sites/www.e-education.psu.edu.geog485/files/data/USA.gdb.zip
[4] https://www.e-education.psu.edu/geog485/sites/www.e-education.psu.edu.geog485/files/data/Alabama.gdb.zip
[5] https://pro.arcgis.com/en/pro-app/arcpy/get-started/fields-and-indexes.htm
[6] https://pro.arcgis.com/en/pro-app/arcpy/data-access/searchcursor-class.htm
[7] https://www.e-education.psu.edu/geog485/sites/www.e-education.psu.edu.geog485/files/data/Pennsylvania.zip
[8] https://pro.arcgis.com/en/pro-app/help/mapping/navigation/write-a-query-in-the-query-builder.htm
[9] https://pro.arcgis.com/en/pro-app/tool-reference/data-management/select-layer-by-attribute.htm
[10] https://pro.arcgis.com/en/pro-app/tool-reference/data-management/select-layer-by-location.htm
[11] https://pro.arcgis.com/en/pro-app/tool-reference/data-management/make-feature-layer.htm
[12] https://pro.arcgis.com/en/pro-app/arcpy/get-started/data-access-using-cursors.htm
[13] https://pro.arcgis.com/en/pro-app/arcpy/data-access/insertcursor-class.htm
[14] https://pro.arcgis.com/en/pro-app/arcpy/data-access/updatecursor-class.htm
[15] https://pro.arcgis.com/en/pro-app/arcpy/get-started/data-access-using-cursors.htm#ESRI_SECTION1_0647F31702954B8C9E4A98703AEF0149
[16] http://en.wikipedia.org/wiki/3-1-1
[17] https://pro.arcgis.com/en/pro-app/arcpy/spatial-analyst/an-overview-of-remap-classes.htm
[18] https://gisdata-wsdot.opendata.arcgis.com/
[19] https://www.e-education.psu.edu/geog485/sites/www.e-education.psu.edu.geog485/files/data/Lesson3PracticeExercises.zip
[20] https://pro.arcgis.com/en/pro-app/tool-reference/data-management/get-count.htm
[21] https://creativecommons.org/licenses/by-nc-sa/4.0/
[22] https://www.e-education.psu.edu/geog489/
[23] https://pro.arcgis.com/en/pro-app/tool-reference/data-management/copy-features.htm
[24] https://gitlab.com/dword4/nhlapi/blob/master/stats-api.md
[25] https://www.e-education.psu.edu/geog485/sites/www.e-education.psu.edu.geog485/files/data/project3_2023.zip