Lesson 4: Practical Python for the GIS analyst

Lesson 4 contains a variety of subjects to help you use Python more effectively as a GIS analyst. The sections of this lesson will reinforce what you've learned already, while introducing some new concepts that will help take your automation to the next level.

You'll learn now to modularize a section of code to make it usable in multiple places. You'll learn how to use new Python modules, such as os, to open and read files; then you'll transfer the information in those files into geographic datasets that can be read by ArcGIS. Finally, you'll learn how to use your operating system to automatically run Python scripts at any time of day.

Lesson 4 checklist

Lesson 4 explores some more advanced Python concepts, including reading and parsing text. To complete Lesson 4, do the following:

  1. One week into the lesson, submit your Final Project proposal to the instructors using the ANGEL e-mail system.  For the exact due date, see the Calendar tab in ANGEL.
  2. Work through the course lesson materials.
  3. Read Zandbergen chapters 7.6, 8.1 - 8.6, 10, and 12.1 - 12.5. In the online lesson pages I have inserted instructions about when it is most appropriate to read each of these chapters.
  4. Complete Project 4 and submit your zipped deliverables to the Project 4 drop box.
  5. Complete the Lesson 4 Quiz.

4.1 Functions and modules

One of the fundamentals of programming that we did not previously cover is functions. To start this lesson, we'll talk about functions and how you can use them to your benefit as you begin writing longer scripts.

A function contains one focused piece of functionality in a reusable section of code. The idea is that you write the function once, then use, or call, it throughout your code whenever you need to. You can put a group of related functions in a module so you can use them in many different scripts. When used appropriately, functions eliminate code repetition and make the main body of your script shorter and more readable.

Functions exist in many programming languages, and each has its way of defining a function. In Python, you define a function using the def statement. Each line in the function that follows the def is indented. Here's a simple function that reads the radius of a circle and reports the circle's approximate area. (Remember that the area is equal to pi [3.14159...] multiplied by the square [** 2] of the radius.)

>>> def findArea(radius):
... 	area = 3.14159 * radius ** 2
... 	return area
... 
>>> findArea(3)
28.27431

Notice from the above example that functions can take parameters, or arguments. When you call the above function, you supply the radius of the circle in parentheses. The function returns the area (notice the return statement, which is new to you).

Thus, to find the area of a circle with a radius of 3 inches, you could make the function call findArea(3) and get the return value 28.27431 (inches).

It's common to assign the returned value to a variable and use it later in your code. For example, you could add these lines in the Interactive Window:

>>> aLargerCircle = findArea(4)
>>> print aLargerCircle
50.26544

A function is not required to return any value. For example, you may have a function that takes the path of a text file as a parameter, reads the first line of the file, and prints that line to the Interactive Window. Since all the printing logic is performed inside the function, there is really no return value.

Neither is a function required to take a parameter. For example, you might write a function that retrieves or calculates some static value. Try this in the Interactive Window:

>>> def getCurrentPresident():
... 	return "Barack Obama"
... 
>>> president = getCurrentPresident()
>>> print president
Barack Obama

The function getCurrentPresident() doesn't take any user-supplied parameters. Its only "purpose in life" is to return the name of the current president. It cannot be asked to do anything else.

Modules

You may be wondering what advantage you gain by putting the above getCurrentPresident() logic in a function. Why couldn't you just define a string currentPresident and set it equal to "Barack Obama?" The big reason is reusability.

Suppose you maintain 20 different scripts, each of which works with the name of the current President in some way. You know that the name of the current President will eventually change. Therefore, you could put this function in what's known as a module file and reference that file inside your 20 different scripts. When the name of the President changes, you don't have to open 20 scripts and change them. Instead, you just open the module file and make the change once.

You may remember that you've already worked with some of Python's built-in modules. The Hi Ho! Cherry O example in Lesson 2 imported the random module so that the script could generate a random number for the spinner result. This spared you the effort of writing or pasting any random number generating code into your script.

You've also probably gotten used to the pattern of importing the arcpy site package at the beginning of your scripts. A site package can contain numerous modules. In the case of arcpy, these modules include Esri functions for geoprocessing.

As you use Python in your GIS work, you'll probably write functions that are useful in many types of scripts. These functions might convert a coordinate from one projection to another, or create a polygon from a list of coordinates. These functions are perfect candidates for modules. If you ever want to improve on your code, you can make the change once in your module instead of finding each script where you duplicated the code.

Creating a module

To create a module, create a new script in PythonWin and save it with the standard .py extension; but instead of writing start-to-finish scripting logic, just write some functions. Here's what a simple module file might look like. This module only contains one function, which adds a set of points to a feature class given a Python list of coordinates.

# This module is saved as practiceModule1.py

# The function below creates points from a list of coordinates
#  Example list: [[-113,23][-120,36][-116,-2]]]

def createPoints(coordinateList, featureClass):

    # Import arcpy and create an insert cursor    
    import arcpy
    rowInserter = arcpy.InsertCursor(featureClass)

    # Loop through each coordinate in the list    
    for coordinate in coordinateList:
    
        # Grab a set of coordinates from the list and
        #  assign them to a point object        
        x = float(coordinate[0])
        y = float(coordinate[1])
        pointGeometry = arcpy.Point(x,y)
        

        # Use the insert cursor to put the point object
        #  in the feature class        
        newPoint = rowInserter.newRow()
        newPoint.Shape = pointGeometry
        rowInserter.insertRow(newPoint)

    # Delete the insert cursor    
    del rowInserter

(Note that if you're using ArcGIS 10.1 with the data access module arcpy.da, you could write it like the following:)

def createPoints(coordinateList, featureClass):

    # Import arcpy and create an insert cursor  
    import arcpy

    with arcpy.da.InsertCursor(featureClass, ("SHAPE@",)) as rowInserter:

        # Loop through each coordinate in the list and make a point    
        for coordinate in coordinateList:
            point = arcpy.Point(coordinate[0],coordinate[1])
            rowInserter.insertRow((point,))

The above function createPoints could be useful in various scripts, so it's very appropriate for putting in a module. Notice that this script has to work with insert cursors and point objects, so it requires arcpy. It's legal to import a site package or module within a module.

Also notice that arcpy is imported within the function, not at the very top of the module like you are accustomed to seeing. This is done for performance reasons. You may add more functions to this module later that do not require arcpy. You should only do the work of importing arcpy when necessary, that is, if a function is called that requires it.

The arcpy site package is only available inside the scope of this function. If other functions in your practice module were called, the arcpy module would not be available to those functions. Scope applies also to variables that you create in this function, such as rowInserter. Scope can be further limited by loops that you put in your function. The variable pointGeometry is only valid inside the for loop inside this particular function. If you tried to use it elsewhere, it would be out of scope and unavailable.

Using a module

So how could you use the above module in a script? Imagine that the module above is saved on its own as practiceModule1.py. Below is an example of a separate script that imports practiceModule1.

# This script is saved as add_my_points.py

# Import the module containing a function we want to call
import practiceModule1

# Define point list and shapefile to edit
myWorldLocations = [[-123.9,47.0],[-118.2,34.1],[-112.7,40.2],[-63.2,-38.7]]
myWorldFeatureClass = "c:\\Data\\WorldPoints.shp"

# Call the createPoints function from practiceModule1
practiceModule1.createPoints(myWorldLocations, myWorldFeatureClass)

The above script is simple and easy to read because you didn't have to include all the logic for creating the points. That is taken care of by the createPoints function in the module you imported, practiceModule1. Notice that to call a function from a module, you need to use the syntax module.function().

Readings

To reinforce the material in this section, read Zandbergen 12.1 - 12.5, which talks about creating Python functions and modules.

Practice

Before moving ahead, get some practice in PythonWin by trying to write the following functions. These functions are not graded, but the experience of writing them will help you in Project 4. Use the course forums to help each other.

  • A function that returns the perimeter of a square given the length of one side.
  • A function that takes a path to a feature class as a parameter and returns a Python list of the fields in that feature class. Practice calling the function and printing the list. However, do not print the list within the function.
  • A function that returns the Euclidean distance between any two coordinates. The coordinates can be supplied as parameters in the form (x1, y1, x2, y2). For example, if your coordinates were (312088, 60271) and (312606, 59468), your function call might look like this: findDistance(312088, 60271, 312606, 59468). Use the Pythagorean formula A ** 2 + B ** 2 = C ** 2. For an extra challenge, see if you can handle negative coordinates.

The best practice is to put your functions inside a module and see if you can successfully call them from a separate script. If you try to step through your code using the debugger, you'll notice that the debugger helpfully moves back and forth between the script and the module whenever you call a function in the module.

4.2 Reading and parsing text

One of the best ways to increase your effectiveness as a GIS programmer is to learn how to manipulate text-based information. In Lesson 3, we talked about how to read data in ArcGIS's native formats, such as feature classes. But often GIS data is collected and shared in more "raw" formats such as an Excel spreadsheet in CSV (comma-separated value) format, a list of coordinates in a text file, or an XML response received through a Web service.

When faced with these files, you should first understand if your GIS software already comes with a tool or script that can read or convert the data to a format it can use. If no tool or script exists, you'll need to do some programmatic work to read the file and separate out the pieces of text that you really need. This is called parsing the text.

For example, a Web service may return you many lines of XML describing all the readings at a weather station, when all you're really interested in are the coordinates of the weather station and the annual average temperature. Parsing the response involves writing some code to read through the lines and tags in the XML and isolating only those three values.

When you parse, you cycle through lines of text, treating them as strings, and pull out the useful information from those strings. In an XML file, for example, you may know that the information you want falls inside a particular tag, such as <AvgTemp>46</AvgTemp>. One approach to getting the value 46 might be to search for the line containing the substring "AvgTemp," then take all the characters that fall between the first > coming from the left of the string and the first < coming from the right.

In another case, you might know that the values you want fall after the second and third commas in a line of comma-separated values. You can split up the line based on comma locations and put all the segments of the string in a Python list. You can then take the third and fourth items in the list to get the values you want. (Remember the third and fourth items would come after the second and third commas, respectively.)

In both cases, the keys to effective parsing are to know how to read lines in a file and know your string manipulation methods. It's helpful to know how to read a string, search for values in a string, split up a string based on some delimeter, and extract particular segments of a string.

Sometimes you can import helper modules, or libraries, into your code to make it easier to parse certain types of text. In the XML example, it may be easier to import xml.dom (described here in Chapter 1 of the book Python & XML), which puts all the XML elements in the file into a series of lists. Searching through those lists is easier than repeatedly searching for the < and > characters. If you're preparing for a big parsing project with XML or some other type of well-known format, it may be worth your while to investigate whether there's a third-party library that can help you.

There are an infinite number of parsing scenarios that you can encounter. This lesson will attempt to teach you the general approach by walking through just one example. In your final project for this course, you may choose to explore parsing other types of files.

Introducing the GPS track parsing example

This example reads a text file collected from a GPS unit. The lines in the file represent readings taken from the GPS unit as the user traveled along a path. In this section of the lesson, you'll learn one way to parse out the coordinates from each reading. The next section of the lesson uses a variation of this example to show how you could write the user's track to a polyline feature class.

The file for this example is called gps_track.txt and it looks something like the text string shown below.  (Please note, line breaks have been added to the file shown below to ensure that the text fits within the page margins. Click on this link to the gps track.txt file to see what the text file actually looks like.)

type,ident,lat,long,y_proj,x_proj,new_seg,display,color,altitude,depth,temp,time,model,filename,ltime
TRACK,ACTIVE LOG,40.78966141,-77.85948515,4627251.76270444,1779451.21349775,True,False,
    255,358.228393554688,0,0,2008/06/11-14:08:30,eTrex Venture, ,2008/06/11 09:08:30
TRACK,ACTIVE LOG,40.78963995,-77.85954952,4627248.40489401,1779446.18060893,False,False,
    255,358.228393554688,0,0,2008/06/11-14:09:43,eTrex Venture, ,2008/06/11 09:09:43
TRACK,ACTIVE LOG,40.78961849,-77.85957098,4627245.69008772,1779444.78476531,False,False,
    255,357.747802734375,0,0,2008/06/11-14:09:44,eTrex Venture, ,2008/06/11 09:09:44
TRACK,ACTIVE LOG,40.78953266,-77.85965681,4627234.83213242,1779439.20202706,False,False,
    255,353.421875,0,0,2008/06/11-14:10:18,eTrex Venture, ,2008/06/11 09:10:18
TRACK,ACTIVE LOG,40.78957558,-77.85972118,4627238.65402635,1779432.89982442,False,False,
    255,356.786376953125,0,0,2008/06/11-14:11:57,eTrex Venture, ,2008/06/11 09:11:57
TRACK,ACTIVE LOG,40.78968287,-77.85976410,4627249.97592111,1779427.14663093,False,False,
    255,354.383178710938,0,0,2008/06/11-14:12:18,eTrex Venture, ,2008/06/11 09:12:18
TRACK,ACTIVE LOG,40.78979015,-77.85961390,4627264.19055204,1779437.76243578,False,False,
    255,351.499145507813,0,0,2008/06/11-14:12:50,eTrex Venture, ,2008/06/11 09:12:50
etc. ...

Notice that the file starts with a header line, explaining the meaning of the values contained in the readings from the GPS unit. Each subsequent line contains one reading. The goal for this example is to create a Python list containing the X,Y coordinates from each reading. Specifically, the script should be able to read the above file and print a text string like the one shown below.

[['-77.85948515', '40.78966141'], ['-77.85954952', '40.78963995'], ['-77.85957098', '40.78961849'], etc.]

Approach for parsing the GPS track

Before you start parsing a file, it's helpful to outline what you're going to do and break up the task into manageable chunks. Here's some pseudocode for the approach we'll take in this example:

  1. Open the file.
  2. Read the header line.
  3. Loop through the header line to find the index positions of the "lat" and "long" values.
  4. Read the rest of the lines.
  5. Split each line into a list of values, using the comma as a delimiter.
  6. Find the values in the list that correspond to the lat and long coordinates and write them to a new list.

Opening the file

The first thing the script needs to do is open the file. Python contains a built-in open() method for doing this. The parameters for this method are the path to the file and the mode in which you want to open the file (read, write, etc.). In this example, "r" stands for read-only mode.

gpsTrack = open("C:\\Data\\GPS\\gps_track.txt", "r")

Reading the header line

Opening the file with the open() method gets you a file object (called gpsTrack in our case). You can read the first line by calling the file.readline() method, like this:

headerLine = gpsTrack.readline()

This returns the string "type,ident,lat,long,y_proj,x_proj,new_seg,display,color,altitude,depth,temp,time,model,filename,ltime".

Looping through the header line to find the index positions of the the "lat" and "long" values

You need to search through this string and find the position of "lat" and "long." If you start counting comma-separated values in this string beginning from zero, it's easy to see that "lat" is at index position 2 and "long" is at index position 3. However, it's a good practice not to hard-code numbers like 2 and 3 into your script. Hard-coded numbers other than 0 or 1 are sometimes derided as magic numbers, suggesting that if you're not the programmer, you might have to use magic to know where the numbers came from!

Avoiding magic numbers gives you greater flexibility. If you wanted to re-use this script with a file in which "lat" and "long" were in different positions, you wouldn't have to modify your code. Even if "lat" and "long" went by some other name, it would be easier to find and change a string in your script instead of finding and changing a "magic number".

So how can you programmatically determine that "lat" is at index 2 and "long" is at index 3? Below is one way that uses the string.split() method. This method puts each "item" in the line into a list. The parameter you pass to the split method determines the delimiter, or the character that determines a new list item. In our case, it's the comma:

valueList = headerLine.split(",")

The above method call returns: ['type', 'ident', 'lat', 'long', 'y_proj', 'x_proj', 'new_seg', 'display', 'color', 'altitude', 'depth', 'temp', 'time', 'model', 'filename', 'ltime']. The key is that now you can cycle through this list and discover the position of "lat" and "long." To do this, you could write a loop that searched through the list for "lat" and "long," but a quicker way is to use the helper method index() that gets you the index position of any item in the list:

latValueIndex = valueList.index("lat")
lonValueIndex = valueList.index("long")

After running the above lines, latValueIndex is equal to 2 and lonValueIndex is equal to 3. With those variables set, you're now ready to start reading the rest of the lines in the file.

Processing the rest of the lines in the file

When you have an open text file, you can always call file.readline() to go to the next line. In our case, we know we're going to use all the rest of the lines in the file, so it's more efficient to call file.readlines() to read them all at once. (This might not be efficient with an extremely long file.) The readlines() method returns a list of all the remaining lines in the file.

Now you can cycle through each GPS reading and split it up based on commas the same way you split up the header. You specifically need to pull out the values in index positions 2 and 3 of your list (represented by latValueIndex and lonValueIndex, respectively) and write those to a new list (coordList).

# Read lines in the file and append to coordinate list
coordList = []

for line in gpsTrack.readlines():
    segmentedLine = line.split(",")
    coordList.append([segmentedLine[lonValueIndex], segmentedLine[latValueIndex]])
   
print coordList

Note a few important things about the above code:

  • coordList actually contains a bunch of small lists within a big list. Each small list is a coordinate pair representing the x (longitude) and y (latitude) location of one GPS reading.
  • The list.append() method is used to add items to coordList. Notice again that you can append a list itself (representing the coordinate pair) using this method.

Full code for the example

Here's the full code for the example. Feel free to download the text file and try it out on your computer.

# Reads a GPS-produced text file and writes the lat and long values
#  to a list of coordinates
gpsTrack = open("C:\\Data\\GPS\\gps_track.txt", "r")

# Figure out position of lat and long in the header
headerLine = gpsTrack.readline()
valueList = headerLine.split(",")

latValueIndex = valueList.index("lat")
lonValueIndex = valueList.index("long")

# Read lines in the file and append to coordinate list
coordList = []

for line in gpsTrack.readlines():
    segmentedLine = line.split(",")
    coordList.append([segmentedLine[lonValueIndex], segmentedLine[latValueIndex]])
   
print coordList

Applications of this script

You might be asking at this point, "What good does this list of coordinates do for me?" Admittedly, the data is still very "raw." It cannot be read directly in this state by a GIS. However, having the coordinates in a Python list makes them easy to get into other formats that can be visualized. For example, these coordinates could be written to points in a feature class, or vertices in a polyline or polygon feature class. The list of points could also be sent to a Web service for reverse geocoding, or finding the address associated with each point. The points could also be plotted on top of a Web map using programming tools like the Google Maps API. Or, if you were feeling really ambitious, you might use Python to write a new file in KML format, which could be viewed in 3D in Google Earth.

Summary

Parsing any piece of text requires you to be familiar with file opening and reading methods, the structure of the text you're going to parse, and string manipulation methods. In the preceding example, we parsed a simple text file, extracting coordinates collected by a handheld GPS unit. We used the string.split() method to break up each GPS reading and find the latitude and longitude values. In the next section of the lesson, you'll learn how you could do more with this information by writing the coordinates to a polyline dataset.

As you use Python in your GIS work, you could encounter a variety of parsing tasks. As you approach these, don't be afraid to seek help from Internet examples, code reference topics such as the ones linked to in this lesson, and your textbook.

Readings

It's worth your time to read Zandbergen 7.6, which talks about parsing text files. Any examples you can pick up with text parsing will help you when you encounter a new file that you need to read. You'll have this experience in the practice exercises and projects this week.

4.3 Writing geometries

As you parse out geographic information from "raw" sources such as text files, you may want to convert it to a format that is native to your GIS. This section of the lesson discusses how to write vector geometries to ArcGIS feature classes. We'll read through the same GPS-produced text file from the previous section, but this time we'll add the extra step of writing each coordinate to a polyline shapefile.

You've already had some experience writing point geometries when we learned about insert cursors. To review, you use arcpy.Point() to create a Point object, then you use an insert cursor to assign it to the geometry field of the feature class (called "shape" for shapefiles).

# Create point
inPoint = arcpy.Point(-121.34, 47.1)

# newRow originates from an insert cursor
newRow.shape = inPoint

For polylines and polygons, you create multiple Point objects that you add to an Array object. Then you make a Polyline or Polygon object using the array. With polygons it's a good practice to make the end vertex the same as the start vertex if possible.

The code below creates an empty array and adds three points using the Array.add() method. Then the array is used to create a Polyline object.

The first parameter you pass in when creating a polyline is the array containing the points for the polyline. The second parameter is a spatial reference of the coordinates, which you should always pass in to ensure that the precision of your data is maintained.

# Make a new empty array
array = arcpy.Array()

# Make some points
point1 = arcpy.Point(-121.34,47.1)
point2 = arcpy.Point(-121.29,47.32)
point3 = arcpy.Point(-121.31,47.02)

# Put the points in the array
array.add(point1)
array.add(point2)
array.add(point3)

# Make a polyline out of the now-complete array
polyline = arcpy.Polyline(array, spatialRef)

# Put the polyline in the feature class
newRow.shape = polyline

Of course, you usually won't create points manually in your code like this with hard-coded coordinates. It's more likely that you'll parse out the coordinates from a file or capture them from some external source, such as a series of mouse clicks on the screen.

Creating a polyline from a GPS track

Here's how you could parse out coordinates from a GPS-created text file like the one in the previous section of the lesson. This code reads all the points captured by the GPS and adds them to one long polyline. The polyline is then written to an empty, pre-existing polyline shapefile with a geographic coordinate system named tracklines.shp. If you didn't have a shapefile already on disk, you could use the Create Feature Class tool to create one with your script.

# Reads a GPS-produced text file and writes the lat and long values
#  to an already-created polyline shapefile
import arcpy

# Hard-coded variables for GPS track text file and feature class
gpsTrack = open("C:\\Data\\GPS\\gps_track.txt", "r")
polylineFC = "C:\\Data\\GPS\\tracklines.shp"
spatialRef = arcpy.Describe(polylineFC).spatialReference
    
# Figure out position of lat and long in the header
headerLine = gpsTrack.readline()
valueList = headerLine.split(",")

latValueIndex = valueList.index("lat")
lonValueIndex = valueList.index("long")

# Create an array to store the points for the polyline
vertexArray = arcpy.Array()

# Read each line in the file
for line in gpsTrack.readlines():
    segmentedLine = line.split(",")

    # Get the lat/lon values of the current GPS reading                    
    latValue = segmentedLine[latValueIndex]
    lonValue = segmentedLine[lonValueIndex]

    # Create a point and add it to the array
    vertex = arcpy.Point(lonValue, latValue)
    vertexArray.add(vertex)

# Create an insert cursor
cursor = arcpy.InsertCursor(polylineFC)
feature = cursor.newRow()

# Put the array in a polyline and write it to the feature class
polyline = arcpy.Polyline(vertexArray, spatialRef)
feature.shape = polyline

cursor.insertRow(feature)    
 
del cursor

The above script starts out the same as the one in the previous section of the lesson. First, it parses the header line of the file to determine the position of the latitude and longitude coordinates in each reading. But then, notice that an array is created to hold the points for the polyline:

vertexArray = arcpy.Array()

After that, a loop is initiated that reads each line and creates a point object from the latitude and longitude values. At the end of the loop, the point is added to the array.

for line in gpsTrack.readlines():
    segmentedLine = line.split(",")

    # Get the lat/lon values of the current GPS reading                    
    latValue = segmentedLine[latValueIndex]
    lonValue = segmentedLine[lonValueIndex]

    # Create a point and add it to the array
    vertex = arcpy.Point(lonValue, latValue)
    vertexArray.add(vertex)

Once all the lines have been read, the loop exits and an insert cursor is created. The cursor is used to create a new row. Then a Polyline object is created and assigned to the shape field, thereby giving the row some geometry.

# Create an insert cursor
cursor = arcpy.InsertCursor(polylineFC)
feature = cursor.newRow()

# Put the array in a polyline and write it to the feature class
polyline = arcpy.Polyline(vertexArray, spatialRef)
feature.shape = polyline

cursor.insertRow(feature)    
 
del cursor

Remember that the cursor places a lock on your dataset, so this script doesn't create the cursor until absolutely necessary (in other words, after the loop). After the row is inserted, the cursor is deleted to remove the lock.

Extending the example for multiple polylines

Just for fun, suppose your GPS allows you to mark the start and stop of different tracks. How would you handle this in the code? You can download this modified text file with multiple tracks if you want to try out the following example.

Notice that in the GPS text file, there is an entry new_seg:

type,ident,lat,long,y_proj,x_proj,new_seg,display,color,altitude,depth,temp,time,model,filename,ltime

new_seg is a boolean property that determines whether the reading begins a new track. If new_seg = true, you need to write the existing polyline to the shapefile and start creating a new one. Take a close look at this code example and notice how it differs from the previous one in order to handle multiple polylines:

# Reads a GPS-produced text file and writes the lat and long values
#  to an already-created polyline shapefile. Handles multiple polylines.

# Function to add a completed single part polyline to the feature class
def addPolyline(cursor, array, sr):
   feature = cursor.newRow()
   polyline = arcpy.Polyline(array, sr)
   feature.shape = polyline
   cursor.insertRow(feature)
   array.removeAll()

# Main script body
import arcpy

# Hard-coded variables for GPS track text file and feature class
gpsTrack = open("C://Data//GPS//gps_track_multiple.txt", "r")
polylineFC = "C://Data//GPS//tracklines_sept25.shp"
spatialRef = arcpy.Describe(polylineFC).spatialReference

# Figure out position of lat and long in the header
headerLine = gpsTrack.readline()
valueList = headerLine.split(",")

latValueIndex = valueList.index("lat")
lonValueIndex = valueList.index("long")
newTrackIndex = valueList.index("new_seg")

# Read lines in the file and append to coordinate list
cursor = arcpy.InsertCursor(polylineFC)
vertexArray = arcpy.Array()

# Read each line and split it
for line in gpsTrack.readlines():
   segmentedLine = line.split(",")
   isNew = segmentedLine[newTrackIndex].upper()

   # If starting a new line, write the completed
   #  line to the feature class
   if isNew == "TRUE":

       # This check is needed to handle the first GPS entry
       if vertexArray.count > 0:
           addPolyline(cursor, vertexArray, spatialRef)

   # Get the lat/lon values of the current GPS reading
   latValue = segmentedLine[latValueIndex]
   lonValue = segmentedLine[lonValueIndex]

   vertex = arcpy.Point(lonValue, latValue)
   vertexArray.add(vertex)

# Add the final polyline to the shapefile
addPolyline(cursor, vertexArray, spatialRef)

del cursor

The first thing you should notice is that this script uses a function. The addPolyline function adds a polyline to a feature class, given three parameters: (1) an existing insert cursor, (2) an array, and (3) a spatial reference. This function cuts down on repeated code and makes the script more readable.

Here's a look at the addPolyline function:

def addPolyline(cursor, array, sr):
   feature = cursor.newRow()
   polyline = arcpy.Polyline(array, sr)
   feature.shape = polyline
   cursor.insertRow(feature)
   array.removeAll()

Notice it's okay to use arcpy in the above function, since it is going inside the body of a script that imports arcpy. However, you want to avoid using variables in the function that are not defined within the function or passed in as parameters.

The addPolyline function is called twice in the script: once within the loop, which we would expect, and once at the end to make sure the final polyline is added to the shapefile. This is where writing a function cuts down on repeated code.

As you read each line of the text file, how do you determine whether it begins a new track? First of all, notice that we've added one more value to look for in this script:

  newTrackIndex = valueList.index("new_seg")

The variable newTrackIndex shows us which position in the line is held by the boolean new_seg property that tells us whether a new polyline is beginning. If you have sharp eyes, you'll notice we check for this later in the code:

  segmentedLine = line.split(",")
  isNew = segmentedLine[newTrackIndex].upper()
    
  # If starting a new line, write the completed
  #  line to the feature class
  if isNew == "TRUE":

In the above code, the upper() method converts the string into all upper-case, so we don't have to worry about whether the line says "true," "True," or "TRUE." But there's another situation we have to handle: What about the first line of the file? This line should read "true," but we can't add the existing polyline to the file at that time, because there isn't one yet. Notice that a second check is performed to make sure there are more than zero points in the array before the array is written to the shapefile:

        # Need this > 0 check to handle the first track
        if vertexArray.count > 0:
            addPolyline(cursor, vertexArray, spatialRef)

The above code checks to make sure there's at least one point in the array, then it calls the addPolyline function, passing in the cursor and the array.

Here's another question to consider: How did we know that the Array object has a count property that tells us how many items are in it? This comes from the ArcGIS Desktop Help topic describing the Array class. In this section of the help there are topics describing each class in arcpy, and you'll come here often if you work with ArcGIS geometries in Python.

In the above-linked Array topic, find the Properties table in this topic and notice that Array has a read-only count property. If we were working with a Python list, we could use len(vertexArray), but in our case vertexArray is an Array object that is native to the ArcGIS geoprocessing programming model. This means it is a specialized object designed by Esri, and you can only learn its methods and properties by examining the documentation. Bookmark these pages!

The GPS parsing example using ArcGIS 10.1 and the arcpy data access module

For reference only, below is how you could write the above script using ArcGIS 10.1 and the data access module arcpy.da. This example handles multiple polylines in the file. The syntax ("SHAPE@",) is a tuple with one item, indicating that just the SHAPE field will be updated using the insert cursor.

# Reads a GPS-produced text file and writes the lat and long values
#  to an already-created polyline shapefile. Handles multiple polylines.

# Function to add a polyline
def addPolyline(cursor, array, sr):
    polyline = arcpy.Polyline(array, sr)
    cursor.insertRow((polyline,))
    array.removeAll()

# Main script body
import arcpy

# Hard-coded variables for GPS track text file and feature class
gpsTrack = open("D://Data//GPS//gps_track_multiple.txt", "r")
polylineFC = "D://Data//GPS//tracklines_sept25.shp"
spatialRef = arcpy.SpatialReference("WGS 1984")

# Figure out position of lat and long in the header
headerLine = gpsTrack.readline()
valueList = headerLine.split(",")

latValueIndex = valueList.index("lat")
lonValueIndex = valueList.index("long")
newTrackIndex = valueList.index("new_seg")

# Read lines in the file and append to coordinate list
with arcpy.da.InsertCursor(polylineFC, ("SHAPE@",)) as cursor:
    vertexArray = arcpy.Array()

    # Read each line and split it
    for line in gpsTrack.readlines():
       segmentedLine = line.split(",")
       isNew = segmentedLine[newTrackIndex].upper()

       # If starting a new line, write the completed
       #  line to the feature class
       if isNew == "TRUE":

           # This check is needed to handle the first GPS entry
           if vertexArray.count > 0:
               addPolyline(cursor, vertexArray, spatialRef)

       # Get the lat/lon values of the current GPS reading
       latValue = segmentedLine[latValueIndex]
       lonValue = segmentedLine[lonValueIndex]

       vertex = arcpy.Point(lonValue, latValue)
       vertexArray.add(vertex)

    # Add the final polyline to the shapefile
    addPolyline(cursor, vertexArray, spatialRef)

Summary

You can write geometries to ArcGIS feature classes using a combination of geometry objects included with ArcGIS. The common workflow is to create Point objects, which you add to an Array object. You can use the Array object plus a spatial reference to create Polyline and Polygon objects. You then use an insert cursor to assign the geometry in the array to the feature class's geometry field (usually called "shape").

You may be wondering how you might create a multi-part feature (such as the state of Hawaii containing multiple islands), or a polygon with a "hole" in it. There are special rules for ordering and nesting Points and Arrays to create these types of geometries. These are covered in the course textbook, which brings us to...

Readings

Read Zandbergen 8.1 - 8.6, which contains a good summary of how to read and write Esri geometries.

4.4 Automation with batch files and scheduled tasks

In this course, we've talked about the benefits of automating your work through Python scripts. It's nice to be able to run several geoprocessing tools in a row without manually traversing the Esri toolboxes, but what's so automatic about launching PythonWin, opening your script, and clicking the Run button? In this section of the lesson, we'll take automation one step further by discussing how you can make your scripts run automatically.

Scripts and your operating system

Most of the time we've run scripts in this course, it's been through PythonWin. Your operating system (Windows) can run scripts directly. Maybe you've tried to double-click a .py file to run a script. As long as Windows understands that .py files represent a Python script and that it should use the Python interpreter to run the script, the script will launch immediately.

When you try to launch a script automatically by double-clicking it, it's possible you'll get a message saying Windows doesn't know which program to use to open your file. If this happens to you, use the Browse button on the error dialog box to browse to the Python executable, most likely located in C:\Python26\ArcGIS10.0\Python.exe. Make sure "Always use the selected program to open this kind of file" is checked and click OK. Windows now understands that .py files should be run using Python.

Double-clicking a .py file gives your operating system the simple command to run that Python script. You can alternatively tell your operating system to run a script using the Windows command line interface. This environment just gives you a blank window with a blinking cursor and allows you to type the path to a script or program, followed by a list of parameters. It's a clean, minimalist way to run a script. In Windows XP, you can open the command line by clicking Start > Run and typing cmd. In Windows Vista or Windows 7, just type cmd in the Search box.

The command line

Advanced use of the command line is outside the scope of this course. For now, it's sufficient to say that you can run a script from the command line by typing the path of the Python executable, followed by the full path to the script, like this:

C:\Python26\ArcGIS10.0\Python.exe C:\WCGIS\Geog485\Lesson1\Project1.py

If the script takes parameters, you must also type each argument separated by a space. Remember that arguments are the values you supply for the script's parameters. Here's an example of a command that runs a script with two arguments, both strings that represent pathnames. Notice that you should use the regular \ in your paths when providing arguments from the command line (not / or \\ as you would use in PythonWin).

C:\Python26\ArcGIS10.0\Python.exe C:\WCGIS\Geog485\Lesson2\Project2.py C:\WCGIS\Geog485\Lesson2\ C:\WCGIS\Geog485\Lesson2\CityBoundaries.shp

If the script executes successfully, you often won't see anything except a new command prompt (remember, this is minimalist!). If your script is designed to print a message, you should see the message. If your script is designed to modify files or data, you can check those files or data (perhaps using ArcCatalog) to make sure the script ran correctly.

You'll also see messages if your script fails. Sometimes these are the same messages you would see in the Python Interactive Window. At other times, the messages are more helpful than what you would see in PythonWin, making the command line another useful tool for debugging. Unfortunately, at some times the messages are less helpful.

Batch files

Why is the command line so important in a discussion about automation? After all, it still takes work to open the command line and type the commands. The beautiful thing about commands is that they, too, can be scripted. You can list multiple commands in a simple text-based file, called a batch file. Running the batch file runs all the commands in it.

Here's an example of a simple batch file that runs the two scripts above. To make this batch file, you could put the text below inside an empty Notepad file and save it with a .bat extension. Remember that this is not Python; it's command syntax:

@ECHO OFF 
REM Runs both my project scripts

C:\Python26\ArcGIS10.0\Python.exe C:\WCGIS\Geog485\Lesson1\Project1.py
ECHO Ran project 1
C:\Python26\ArcGIS10.0\Python.exe C:\WCGIS\Geog485\Lesson2\Project2.py C:\WCGIS\Geog485\Lesson2\ C:\WCGIS\Geog485\Lesson2\CityBoundaries.shp
ECHO Ran project 2
PAUSE

Here are some notes about the above batch file, starting from the top:

  • @ECHO OFF prevents all the lines in your batch file from being printed to the command line window, or console, when you run the file. It's standard procedure to use this as the first line of your batch file, unless you really want to see which line of the file is executing (perhaps for debugging purposes).
  • REM is how you put a comment in your batch file, the same way # denotes a comment in Python.
  • You put commands in your batch file using the same syntax you used from the command line.
  • ECHO prints something to the console. This can be useful for debugging, especially when you've used @ECHO OFF because you don't want to see every line of your batch file printed to the console.
  • PAUSE gives a "Press any key to continue..." prompt. If you don't put this at the end of your batch file, the console will immediately close after the file is done executing. When you're writing and debugging the batch file, it's useful to put PAUSE at the end so you can see any error messages that were printed when running the file. Once your batch file is tested and working correctly, you can remove the PAUSE.

Batch files can contain variables, loops, comments, and conditional logic, all of which are beyond the scope of this lesson. However, if you'll be writing and running many scripts for your organization, it's worthwhile to spend some time learning more about batch files. Fortunately, batch files have been around for a long time (they are older than Windows itself), so there's an abundance of good information available on the Internet to help you.

Scheduling tasks

At this point we've come pretty close to reaching true automation, but there's still that need to launch the Python script or the batch file, either by double-clicking it, invoking it from the command line, or otherwise telling the operating system to run it. To truly automate the running of scripts and batch files, you can use an operating system utility such as Windows Task Scheduler.

Task Scheduler is one of those items hidden in Windows System Tools that you may not have paid any attention to before. It's a relatively simple program that allows you to schedule your scripts and batch files to run on a regular basis. This is helpful if the task needs to run often enough that it would be burdensome to launch the batch file manually; but it's even more helpful if the task takes some of your computing resources and you want to run it during the night or weekend to minimize impact on others who may be using the computer.

Here's a real-world scenario where Task Scheduler (or a comparable utility if you're running on a Mac, Linux, or UNIX) is very important: Fast Web maps tend to use a server-side cache of pregenerated map images, or tiles, so that the server doesn't have to draw the map each time someone navigates to an area. A Web map administrator who has ArcGIS Server can run the tool Manage Map Server Cache Tiles to make the tiles before he or she deploys the Web map. After deployment, the server quickly sends the appropriate tiles to people as they navigate the Web map. So far so good.

As the source GIS data for the map changes, however, the cache tiles become out of date. They are just images and do not know how to update themselves automatically. The cache needs to be updated periodically, but cache tile creation is a time consuming and CPU-intensive operation. For this reason, many server administrators use Task Scheduler to update the cache. This usually involves writing a script or batch file that runs Manage Map Server Cache Tiles and other caching tools, then scheduling that script to run on nights or weekends when it would be least disruptive to users of the Web map.

Inside Windows Task Scheduler

Let's take a quick look inside Windows Task Scheduler. The instructions below are for Windows Vista (and probably Windows 7). Other versions of Windows have a very similar Task Scheduler, and with some adaptation you can also use the instructions below to understand how to schedule a task.

  1. Open Task Scheduler by navigating the Windows Start menu to All Programs > Accessories > System Tools > Task Scheduler.
  2. Click Create Basic Task. This walks you through a simple wizard to set up the task. You can configure advanced options on the task later.
  3. Give your task a Name that will be easily remembered and optionally, a Description. Then click Next.
  4. Choose how often you want the task to run. For this example, choose Daily. Then click Next.
  5. Choose a Start time and a recurrence frequency. If you want, choose a time a few minutes ahead of the current time, so you can see what it looks like when a task runs. Then click Next.
  6. Choose Start a program, then click Next.
  7. Here's the moment of truth where you specify which script or batch file you want to run. Click Browse and navigate to one of the Python scripts you've written during this course. It's going to be easiest here if you pick a script that doesn't take any arguments, such as your project 1 script that makes contour lines from hard-coded datasets, but if you are feeling brave you can also add arguments in this panel of the wizard. Then click Next.
  8. Review the information about your task, then click Finish.
  9. Notice that your task now appears in the list in Task Scheduler. You can highlight the task to see its properties, or right-click the task and click Properties to actually set those properties. You can use the advanced properties to get your script to run even more frequently than daily, for example, every 15 minutes.
  10. Wait for your scheduled time to occur, or if you don't want to wait, right-click the task and click Run. Either way, you'll see a console window appear when the script begins and disappear once the script has finished. (If you're running a Python script and you don't want the console window to disappear at the end, you can put a line at the end of the script such as lastline = raw_input(">"). This stops the script until the user presses Enter. Once you're comfortable with the script running on a regular basis, you'll probably want to remove this line to keep open console windows from cluttering your screen. After all, the idea of a scheduled task is that it happens in the background without bothering you.)
    Screen capture of the Windows Task Scheduler
    Figure 4.1 The Windows Task Scheduler.

Summary

To make your scripts run automatically, you use Windows Task Scheduler to create a task that the operating system runs at regular intervals. The task can point at either a .py file (for a single script), or a .bat file (for multiple scripts). Using scheduled tasks, you can achieve full automation of your GIS processes.

4.5 Running any tool in the box

Sooner or later, you're going to have to include a geoprocessing tool in your script that you have never run before. It's possible that you've never even heard of the tool or run it from its GUI, let alone a script.

In other cases, you may know the tool very well, but your Python may be rusty, or you may not be sure how to construct all the necessary parameters.

The approach for both of these situations is the same. Here are some suggested steps for running any tool in the ArcGIS toolboxes using Python:

  1. Find the tool reference documentation. We've seen this already during the course. Each tool has its own topic in the Geoprocessing tool reference section of the ArcGIS Help. Open that topic and read it before you do anything else. Read the "Usage" section at the beginning to make sure that it's the right tool for you and that you are about to employ it correctly.
  2. Examine the parameters. Scroll down to the "Syntax" section of the topic and read which parameters the tool accepts. Note which parameters are required and which are optional, and decide which parameters your script is going to supply.
  3. In your Python script, create variables for each parameter. Note that each parameter in the "Syntax" section of the topic has a data type listed. If the data type for a certain parameter is listed as "String," you need to create a Python string variable for that parameter.

    Sometimes the translation from data type to Python variable is not direct. For example, sometimes the tool reference will say that the required variable is a "Feature Class." What this really means for your Python script is that you need to create a string variable containing the path to a feature class.

    Another example is if the tool reference says that the required data type is a "Long." What this means in Python is that you need to create a numerical variable (as opposed to a string) for that particular parameter.

    If you have doubts about how to create your variable to match the required data type, scroll down to the "Code Sample" in the tool reference topic. Try to find the place where the example script defines the variable you're having trouble with. Copy the patterns that you see in the example script and usually you'll be okay.

    Most of the commonly used tools have excellent example scripts, but others are hit or miss. If your tool of interest doesn't have a good example script, you may be able to find something on the Esri forums, ArcScripts, or a well-phrased Google search.

  4. Run the tool...with error handling. You can run your script without try/except blocks to catch any basic errors in the Interactive Window. If you're still not getting anything helpful, a next resport is to add the try/except blocks and put print arcpy.GetMessages() in the except block.

In Project 4 you'll get a chance to practice these skills to run a tool you previously haven't worked with in a script.

4.6 Working with map documents

To this point, we've talked about automating geoprocessing tools, updating GIS data, and reading text files. However, we've not covered anything about working with an Esri map document. There are many tasks that can be performed on a map document that are well-suited for automation. These include:

  • Finding and replacing text in a map or series of maps. For example, a copyright notice for 2010 becomes 2011.
  • Repairing layers that are referencing data sources using the wrong paths. For example, your map was sitting on a computer where all the data was in C:\data and now it is on a computer where all the data is in D:\myfolder\mydata.
  • Printing a series of maps or data frames.
  • Exporting a series of maps to PDF and joining them to create a "map book."
  • Making a series of maps available to others on ArcGIS Server.

Esri map documents are binary files, meaning they can't be easily read and parsed using the techniques we covered earlier in this lesson. Until very recently the only way to automate anything with a map document was to use ArcObjects, which is somewhat challenging for beginners and requires using a language other than Python. With the release of ArcGIS 10.0, Esri added a Python module for automating common tasks with map documents.

The arcpy.mapping module

arcpy.mapping is a module you can use in your scripts to work with map documents. Please take a detour at this point to read the Esri overview of arcpy.mapping, which is found in the topic Geoprocessing scripts for map document management and output.

The most important object in this module is MapDocument. This tells your script which map you'll be working with. You can get a MapDocument by referencing a path, like this:

mxd = arcpy.mapping.MapDocument(r"C:\data\Alabama\UtilityNetwork.mxd")

Notice the use of r in the line above to denote a string literal. In other words, if you include r right before you begin you're string, it's safe to use reserved characters like the single backslash \. I've done it here because you'll see it in a lot of the Esri examples with arcpy.mapping.

Instead of directly using a string path, you could alternatively put a variable holding the path. This would be useful if you were iterating through all the map documents in a folder using a loop, or if you previously obtained the path in your script using something like arcpy.GetParameterAsText().

It can be convenient to work with arcpy.mapping in the Python window in ArcMap. In this case, you do not have to put the path to the MXD. There's a special keyword "CURRENT" that you can use to get a reference to the currently-open MXD.

mxd = arcpy.mapping.MapDocument("CURRENT")

Once you get a MapDocument, then you do something with it. Most of the functions in arcpy.mapping take a MapDocument object as a parameter. Let's look at this first script from the Esri help topic linked above and scrutinize what is going on. I've added comments to each line.

# Create a MapDocument object referencing the MXD you want to update
mxd = arcpy.mapping.MapDocument(r"C:\GIS\TownCenter_2009.mxd")

# Loop through each text element in the map document
for textElement in arcpy.mapping.ListLayoutElements(mxd, "TEXT_ELEMENT"):
    
    # Check if the text element contains the out of date text
    if textElement.text == "GIS Services Division 2009":
	    
	# If out of date text is found, replace it with the new text
        textElement.text = "GIS Services Division 2010"
		
# Export the updated map to a PDF
arcpy.mapping.ExportToPDF(mxd, r"C:\GIS\TownCenterUpdate_2010.pdf")

# Clean up the MapDocument object by deleting it
del mxd

The first line in the above example gets a MapDocument object referencing C:\GIS\TownCenter_2009.mxd. The example then employs two functions from arcpy.mapping. The first is ListLayoutElements. Notice that the parameters for this function are a MapDocument and the type of layout element you want to get back, in this case, "TEXT_ELEMENT". (Examine the documentation for List Layout Elements to understand the other types of elements you can get back.)

The function returns a Python list of TextElement objects representing all the text elements in the map document. You know what to do if you want to manipulate every item in a Python list. In this case, the example uses a for loop to check the TextElement.text property of each element. This property is readable and writeable, meaning if you want to set some new text, you can do so by simply using the equals sign assignment operator as in textElement.text = "GIS Services Division 2010"

The ExportToPDF function is very simple in this script. It takes a MapDocument and the path of the output PDF as parameters. If you look at the documentation for ExportToPDF, you'll notice a lot of other optional parameters for exporting PDFs, such as whether to embed fonts, that are just left as defaults in this example.

Learning arcpy.mapping

The best way to learn arcpy.mapping is to try to use it. Because of its simple, "one-line-fix" nature, it's a good place to practice your Python. It's also a good way to get used to the Python window in ArcMap, because you can immediately see the results of your actions.

Although there is no arcpy.mapping component to this lesson's project, you're welcome to use it in your final project. If you've already submitted your final project proposal, you can amend it to use arcpy.mapping by e-mailing and obtaining approval from the instructors. If you use arcpy.mapping in your final project, you should attempt to incorporate several of the functions or mix it with other Python functionality you've learned, making something more complex than the "one line fix" type of script I mentioned above.

By now you'll probably have experienced the reality that your code does not always run as expected on the first try. Before you start running arcpy.mapping commands on your production MXDs, I suggest making backup copies.

Here are a few additional places where you can find excellent help on learning arcpy.mapping:

4.7 Limitations of Python scripting with ArcGIS

In this course you've learned the basics of programming and have seen how Python can automate any GIS function that can be performed with the ArcGIS toolboxes. There's a lot of power available to you through scripting, and hopefully you're starting to get ideas about how you can apply that in your work outside this course.

To conclude this lesson, however, it's important to talk about what's not available through Python scripting in ArcGIS.

Limits with fine-grained access to the "guts" of ArcGIS

At ArcGIS, Python interaction with ArcGIS is mainly limited to reading and writing data, editing the properties of map documents, and running the tools that are included with ArcGIS. Although the ArcGIS tools are useful, they are somewhat black box, meaning you put things in and get things out without knowing or being concerned about what is happening inside. If you want a greater degree of control over how ArcGIS is manipulating your data, you need to work with ArcObjects.

ArcObjects can be thought of as "the building blocks" of ArcGIS. In fact, an analogy with the children's Lego building bricks works well to describe ArcObjects: Programming with ArcObjects is akin to having an enormous selection of Legos of all shapes and sizes, whereas Python scripting is like working with a kit containing some large prefabricated pieces that make it much easier to construct a particular final product.

Because of the sheer amount of functionality and objects available to you, ArcObjects is more challenging to learn than simple Python scripting. Usually, an equivalent task takes many more lines of code to write in ArcObjects than in a Python script. However, when you use ArcObjects you have much greater control over what happens in your program. You can take a small piece of functionality and use it without the overhead of a tool or all the other parameters that come with a tool.

Limits with user interface customization at ArcGIS 10.0

In this course we have done nothing with customizing ArcMap to add special buttons, toolbars, and so on that trigger our programs. Our foray into user interface design has been limited to making a script tool and toolbox. Although script tools are useful, there are times when you want to take the functionality out of the toolbox and put it directly into ArcMap as a button on a toolbar. You may want that button to launch a new window with text boxes, labels, and buttons that you design yourself.

In ArcGIS 10.0 if you want to put custom functionality or programs directly into ArcMap, you need to use Visual Basic for Applications (VBA), C ++, or a .NET language (VB.NET or C#) working with ArcObjects. The functionality may be as simple as putting some custom actions behind a button (zoom to a certain bookmark, for example), or you may open a full-blown program you develop with multiple forms, options, and menus. The aforementioned languages have IDEs in which you can design custom user interfaces with text boxes, labels, buttons, and so on.

Geog 489, another elective course in the GIS certificate program, covers GIS customization using ArcObjects.

New Python add-in functionality at ArcGIS 10.1

To allow a greater degree of interactivity between the ArcMap user interface and Python scripts, ArcGIS 10.1 introduces the concept of a Python add-in. These allow you to attach Python logic to a limited set of actions you perform in ArcMap, such as zooming the map, opening a new map document, or clicking a button on a custom toolbar. For example, you might create an add-in that automatically adds a particular set of layers any time someone pushes a certain button on your toolbar.

With Python add-ins, you get access to a number of user interface elements to use as a front end to your Python scripts, including toolbars, buttons, menus, combo boxes, and basic file browsing and Yes/No confirmation dialog boxes. There's also a set of common events that you can detect and respond to in your code, such as the map opening, the map extent changing, or the spatial reference changing. Although this is far from the full realm of ArcObjects and .NET customization possibilities, it gives a lot more possibilities than were available in previous versions of ArcGIS.

The nice thing about add-ins is that they are easily shareable. You download the Python Add-In Wizard from Esri, and it helps you prepare and package up your add-in into a .esriaddin file. Other people with ArcGIS can then install the add-in from the .esriaddin file.

Working with Python add-ins is currently not included in the scope of this course, but you can learn all about them in the help book ArcGIS Desktop Python add-ins. After reading this material and getting a basic understanding of what's required to create add-ins, you're welcome to incorporate them into your final project if you have ArcGIS 10.1 and you are confident that you can work somewhat independently to test and create the add-ins. If you have struggled in the course, I recommend that you wait until after completing Geog 485 to further explore add-ins, so that you can give them the necessary amount of time and testing.

Lesson 4 Practice Exercises Introduction

These practice exercises will give you some more experience applying the Lesson 4 concepts. They are designed to prepare you for some of the techniques you'll need to use in your Project 4 script.

Download the data for the practice exercises

Both exercises involve opening a file and parsing text. In Practice Exercise A, you'll read some coordinate points and make a polygon from those points. In Practice Exercise B, you'll work with dictionaries to manage information that you parse from the text file.

Example solutions are provided for both practice exercises. You'll get the most value out of the exercises if you make your best attempt to complete them on your own before looking at the solutions. In any case, the patterns shown in the solution code can help you approach Project 4.

Lesson 4 Practice Exercise A

This practice exercise is designed to give you some experience writing geometries to a shapefile. You have been provided two things:

  • A text file MysteryStatePoints.txt containing the coordinates of a state boundary.
  • An empty polygon shapefile that uses a geographic coordinate system.

The objective

Your job is to write a script that reads the text file and creates a state boundary polygon out of the coordinates. When you successfully complete this exercise, you should be able to preview the shapefile in ArcCatalog and see the state boundary.

Tips

If you're up for the challenge of this script, go ahead and start coding. But if you're not sure how to get started, here are some tips:

  • This script will differ from some of the examples you've seen. There is no header line for the file, and there is only one line of text to read. This should actually make the file easier to process.
  • Another difference is that the items of interest are separated by a | character. Remember that when you call the split() method, you can pass in any delimiter. Previously we have used a comma (",") but you can use the | just as easily ("|").
  • Before you start looping through the coordinates, create an Array object to hold all the points in your polygon.
  • Loop through each coordinate and create a Point object from the coordinate pair. Then add the Point object to your Array object.
  • Once you start looping through the coordinates, you'll be dealing with coordinate pairs such as -109.05,31.33. You need to split this again (this time using a comma delimiter) in order to isolate the X and Y values.
  • Once you're done looping, create an insert cursor on your shapefile. Go to the first row and assign your Array to the SHAPE field.

Lesson 4 Practice Exercise A Solution

Here's one way you could approach Lesson 4 Practice Exercise A. If you have a different or more efficient solution, please share in the forums. Note that the video is several quarters old and shows a slightly different way of creating an array, using the CreateObject method. Also in the video, a Polygon object is not created; the array is assigned directly to the SHAPE field. Although both techniques work, I recommend that you continue creating your geometry objects directly from arcpy like we have been doing in this lesson, and as shown in the code sample below.

# Reads coordinates from a text file and writes a polygon

import arcpy

shapefile = "C:\\Data\\Lesson4PracticeExerciseA\\MysteryState.shp"
pointFilePath = "C:\\Data\\Lesson4PracticeExerciseA\\MysteryStatePoints.txt"
spatialRef = arcpy.Describe(shapefile).spatialReference

# Open the file and read the first (and only) line
pointFile = open(pointFilePath, "r")
lineOfText = pointFile.readline()

# Make a Python list out of the coordinates. | is the delimiter
coordinatePairList = lineOfText.split("|")

# This Array object will hold a clockwise "ring" of Point
#  objects, thereby making a polygon.
polygonArray = arcpy.Array()

# Loop through each coordinate pair and make a Point object
for coordinatePair in coordinatePairList:
    # Split the coordinate pair by comma. This gives you
    #  a list with two items: the X and Y coordinates
    coordinates = coordinatePair.split(",")

    # Create a point, assigning the X and Y values from your list    
    currentPoint = arcpy.Point(coordinates[0],coordinates[1])
 
    # Add the newly-created Point to your Array    
    polygonArray.add(currentPoint)

# Create an insert cursor and apply your array to a polygon object
cursor = arcpy.InsertCursor(shapefile)
row = cursor.newRow()
polygon = arcpy.Polygon(polygonArray, spatialRef)

# Insert the polygon as a row
row.SHAPE = polygon
cursor.insertRow(row)

# Release locks by deleting cursors
del row
del cursor

Alternate solution using the arcpy data access module in ArcGIS 10.1

Here's an example of how you might solve this practice exercise using the arcpy data access module in ArcGIS 10.1. In this case the "SHAPE@" token is used to assign the geometry to a row. The syntax ("SHAPE@",) is a tuple with one item indicating that just the SHAPE field will be updated.

# Reads coordinates from a text file and writes a polygon

import arcpy

shapefile = "D:\\data\\Geog485\\Lesson4PracticeExerciseA\\MysteryState.shp"
pointFilePath = "D:\\data\\Geog485\\Lesson4PracticeExerciseA\\MysteryStatePoints.txt"
spatialRef = arcpy.Describe(shapefile).spatialReference

# Open the file and read the first (and only) line
pointFile = open(pointFilePath, "r")
lineOfText = pointFile.readline()

# Make a Python list out of the coordinates. | is the delimiter
coordinatePairList = lineOfText.split("|")

# This Array object will hold a clockwise "ring" of Point
#  objects, thereby making a polygon.
polygonArray = arcpy.Array()

# Loop through each coordinate pair and make a Point object
for coordinatePair in coordinatePairList:
    # Split the coordinate pair by comma. This gives you
    #  a list with two items: the X and Y coordinates
    coordinates = coordinatePair.split(",")

    # Create a point, assigning the X and Y values from your list    
    currentPoint = arcpy.Point(coordinates[0],coordinates[1])
 
    # Add the newly-created Point to your Array    
    polygonArray.add(currentPoint)

# Create a Polygon from your Array
polygon = arcpy.Polygon(polygonArray, spatialRef)

# Create an insert cursor and apply the Polygon to a new row
with arcpy.da.InsertCursor(shapefile, ("SHAPE@",)) as cursor:
    cursor.insertRow((polygon,))

 

Lesson 4 Practice Exercise B

This practice exercise does not do any geoprocessing or GIS, but it will help you get some experience working with functions and dictionaries. The latter will be especially helpful as you work on Project 4.

The objective

You've been given a text file of (completely fabricated) soccer scores from some of the most popular teams in Buenos Aires. Write a script that reads through the scores and prints each team name, followed by the maximum number of goals that team scored in a game, for example:

River: 5
Racing: 4
etc.

Keep in mind that the maximum number of goals scored might have come during a loss.

You are encouraged to use dictionaries to complete this exercise. This is probably the most efficient way to solve the problem. You'll also be able to write at least one function that will cut down on repeated code.

I have purposefully kept this text file short to make things simple to debug. This is an excellent exercise in using the debugger, especially to watch your dictionary as you step through each line of code.

Tips

If you want a challenge, go ahead and start coding. Otherwise, here are some tips that can help you get started:

  • Your approach should be to read through each line and split it using a space delimiter (" ").
  • Create variables for all your items of interest, including winner, winnerGoals, loser, and loserGoals and assign them appropriate values based on what you parsed out of the line of text.
  • Review chapter 6.8 on dictionaries in the Zandbergen text. You want to make a dictionary that has a key for each team, and an associated value that represents the team's maximum number of goals. If you looked at the dictionary in the debugger it would look like {'River': '5', 'Racing': '4', etc.}
  • You can write a function that takes in three things: the key (team name), the number of goals, and the dictionary name. This function should then check if the key has an entry in the dictionary. If not, a key should be added and its value set to the current number of goals. If a key is found, you should perform a check to see if the current number of goals is higher than the value associated with that key. If so, you should set a new value. Notice how many "ifs" appear in the preceding sentences.
  • Some of the lines of text end with the new line character "\n". This can happen with some text files that come out of Notepad. You can get rid of this with the rstrip() method: line = line.rstrip("\n").

 

Lesson 4 Practice Exercise B Solution

This practice exercise is a little trickier than previous exercises. If you were not able to code a solution, study the following solution carefully and make sure you know the purpose of each line of code.

# Reads through a text file of soccer (football)
#  scores and reports the highest number of goals
#  in one game for each team

# ***** DEFINE FUNCTIONS *****

# This function checks if the number of goals scored
#  is higher than the team's previous max.
def checkGoals(team, goals, dictionary):
    #Check if the team has a key in the dictionary
    if team in dictionary:
        # If a key was found, check goals against team's current max
        if goals > dictionary[team]:
            dictionary[team] = goals
        else:
            pass
    # If no key found, add one with current number of goals
    else:
        dictionary[team] = goals

# ***** BEGIN SCRIPT BODY *****        
        
# Open the text file of scores
scoresFilePath = "C:\\Data\\Lesson4PracticeExerciseB\\Scores.txt"
scoresFile = open(scoresFilePath)

# Read the header line and get the important field indices
headerLine = scoresFile.readline()
headerLine = headerLine.rstrip("\n") #Remove "new line" character

segmentedHeaderLine = headerLine.split(" ")
winnerIndex = segmentedHeaderLine.index("Winner")
winnerGoalsIndex = segmentedHeaderLine.index("WG")
loserIndex = segmentedHeaderLine.index("Loser")
loserGoalsIndex = segmentedHeaderLine.index("LG")

# Create an empty dictionary. Each key will be a team name.
#  Each value will be the maximum number of goals for that team.
maxGoalsDictionary = {}

# Loop through each line of the file
for line in scoresFile.readlines():
    line = line.rstrip("\n") # Remove "new line" character
    segmentedLine = line.split(" ")

    # Create variables for all items of interest in the line of text    
    winner = segmentedLine[winnerIndex]
    winnerGoals = segmentedLine[winnerGoalsIndex]
    loser = segmentedLine[loserIndex]
    loserGoals = segmentedLine[loserGoalsIndex]

    # Check the winning number of goals against the team's max
    checkGoals(winner, winnerGoals, maxGoalsDictionary)

    # Also check the losing number of goals against the team's max    
    checkGoals(loser, loserGoals, maxGoalsDictionary)

# Print the results
for key in maxGoalsDictionary:
    print key + ": " + maxGoalsDictionary[key]

Lesson 4 Practice Exercise C

This exercise gives you some more practice using dictionaries. This time, you’ll be reading some values from a pre-built dictionary Each key in the dictionary is a type of animal. The corresponding value for each key is a Python list containing different types of that animal. Your task is to print the average length of the name of each animal type. For example, if you were looking at the key “Birds”, there are three types of birds in the list (“Robin”, “Canary”, and “Bluebird”) and the average length of those strings is 6.33 (in other words, (5 + 6 + 8)/3).

You’ll start with the code below that builds the dictionary. Copy and paste this into an empty script and start writing your code below it. Your dictionary name will be animals:

 

# function to load dictionarydef BuildDictionary():

   #create lists
   dogList = ["Dalmatian", "German Shepherd"]
   catList = ["American Shorthair"]
   birdList = ["Robin", "Canary","Bluebird" ]

   #use dict() constructor to create dictionary and add keys and values
   return dict([('dogs', dogList), ('cats', catList), ('birds', birdList)])

# Call the function and assign the result to the variable 'animals'.
animals = BuildDictionary()
# New code to print the average length of the animal names for each animal type
# (dogs, cats, and birds) should be inserted after this line.

 

 

Tips

If you're up for the challenge of this script, go ahead and start coding. But if you're not sure how to get started, here are some tips:

    • You can retrieve values by calling MyDictionary[key] where MyDictionary is a dictionary and key is a valid key.
    • You can retrieve the set of keys by calling MyDictionary.keys() and retrieve the associated values by calling MyDictionary.values().
    • You can find the length of a string by using the function len, for example, len(MyString)
    • There are many ways to solve this problem, the answer gives two.

    Lesson 4 Practice Exercise C Solution

    This point of this practice exercise was to help you understand how to handle using nested data structures, like lists, as values in dictionaries. This may seem confusing if you are not used to nested data structures, however, they are often the most straightforward way of handling complex data. You many need to use a similar technique to solve the lesson 4 assignment.

    Two ways of solving the problem are shown below, both accomplish the same thing.

     

    # Name: dictionaries.py
    # Description: Solves sample problem using dictionaries.
    # Author: Frank Hardisty
    
    # function to load dictionary
    def BuildDictionary():
    
        #create lists
        dogList = ["Dalmatian", "German Shepherd"]
        catList = ["American Shorthair"]
        birdList = ["Robin", "Canary","Bluebird" ]
    
        #use dict() constructor to create dictionary and add keys and values
        return dict([('dogs', dogList), ('cats', catList), ('birds', birdList)])
    
    # call the function and assign the result to the variable 'animals'
    animals = BuildDictionary()
    
    #find average length of names for different animal types two different ways
    
    #define a floating point variable to hold totals
    total = 0.0
    
    
    # first approach: using the known keys
    dList = animals['dogs']
    
    for item in dList:
        total = total + len(item)
    
    total = total / len(dList)
    
    print 'dogs: ' + str(total)
    
    
    total = 0.0
    cList = animals['cats']
    
    for item in cList:
        total = total + len(item)
    
    total = total / len(cList)
    
    print 'cats: ' + str(total)
    
    
    total = 0.0
    bList = animals['birds']
    
    for item in bList:
        total = total + len(item)
    
    total = total / len(bList)
    
    print 'birds: ' + str(total)
    
    # second approach: iterating over lists
    
    for key in animals.keys():
        total = 0.0
        animalList = animals[key]
        for item in animalList:
            total = total + len(item)
        total = total / len(animalList)
        print key + ": " + str(total)
    
    

    Project 4: Parsing rhinoceros sightings

    In this project, you're working for a wildlife conservation group that is tracking rhinos in the African savannah. Your field workers' software resources and GIS expertise are limited, but you have managed to obtain an Excel spreadsheet showing the positions of several rhinos over time. Each record in the spreadsheet shows the latitude/longitude coordinate of a rhino along with the rhino's name (these rhinos are well known to your field workers).

    You want to write a script that will turn the readings in the spreadsheet into a vector dataset that you can place on a map. This will be a polyline dataset showing the tracks the rhinos followed over the time the data was collected.

    Please carefully read all the following instructions before beginning the project.

    Deliverables

    This project has the following deliverables:

    1. Your plan of attack for this programming problem, written in pseudocode in any text editor. This should consist only of short, focused steps describing what you are going to do to solve the problem. This is a separate deliverable from your customary project writeup.
    2. A Python script that reads the data from the spreadsheet and creates, from scratch, a polyline shapefile with n polylines, n being the number of rhinos in the spreadsheet. Each polyline should represent a rhino's track chronologically from the beginning of the spreadsheet to the end of the spreadsheet. Each polyline should also have a text attribute containing the rhino's name. The shapefile should use the WGS 1984 geographic coordinate system.
    3. A short writeup (~300 words) explaining what you learned during this project and which requirements you met, or failed to meet. Also describe any "over and above" efforts here so that the graders can look for them.

    Successful delivery of the above requirements is sufficient to earn 90% on the project. The remaining 10% is reserved for efforts that go "over and above" the minimum requirements. This could include (but is not limited to) useful code comments, an insightful writeup explaining some lesson learned during the coding, a batch file that could be used to automate the script, creation of the feature class in a file geodatabase instead of a shapefile, or the breaking out of repetitive code into functions and/or modules.

    Challenges

    You may already see several immediate challenges in this task:

    • The data is in a format (XLSX) that you cannot easily parse. The first step you must do is manually open the file in Excel and save it as a comma-delimited format that you can easily read with a script. Choose the option CSV (comma-delimited) (*.csv).

      If you are so inclined, you can attempt to download and use a Python library that works directly with XLSX files. Be aware that you will have less comprehensive "technical support" from your fellow students if you use this route.

    • The rhinos in the spreadsheet appear in no guaranteed order, and not all the rhinos appear at the beginning of the spreadsheet. As you parse each line, you must determine which rhino the reading belongs to and update that rhino's polyline track accordingly. You are not allowed to sort the Rhino column in Excel before you export to the CSV file. Your script must be "smart" enough to work with an unsorted spreadsheet in the order that the records appear.
    • You do not immediately know how many rhinos are in the file or even what their names are. Although you could visually comb the spreadsheet for this information and hard-code each rhino's name, your script is required to handle all the rhino names programmatically. The idea is that you should be able to run this script on a different file, possibly containing more rhinos, without having to make many manual adjustments.
    • You have not previously created a feature class programmatically. You must find and run ArcGIS geoprocessing tools that will create an empty polyline shapefile with a text field for storing the rhino's name. You must also assign the WGS 1984 geographic coordinate system as the spatial reference for this shapefile.

    Hints

    • Before you start writing code, write a plan of attack describing the logic your script will use to accomplish this task. Break up the original task into small, focused chunks. You can write this in Word or even Notepad. Your objective is not to write fancy prose, but rather short, terse statements of what your code will do: in other words, pseudocode. Here's an example of some pseudocode that might appear in your file:

      . . .

      Read the next line.

      Split the line.

      Determine the rhino referenced in this line.

      Determine if the dictionary has a key for the rhino.

      If no key exists, create a new array object.

      Create a new point object.

      Assign the X reading to the X coordinate of the point.

      Assign the Y reading to the Y coordinate of the point.

      Add the point to the array.

      Add the array to the dictionary using the rhino name as the key.

      . . .

      If you do a good job writing your pseudocode, you'll find that each line translates into about one line of code. Writing your script then becomes a matter of translating from English to code. You may also find it helpful to sketch out a diagram of the workflow and logistical branches in your script.

    • You will have a much easier time with this assignment if you first create the array objects representing each rhino track, then use insert cursors to add the arrays once they are completed. Not only is this easier to code, it's better for performance to open the insert cursor only once near the end of the script.

    • A Python dictionary is an excellent structure for storing a rhino name coupled with the rhino's array of observed locations. A dictionary is similar to a list, but it stores items in key-value pairs. For example, a key could be a string representing the rhino name, and that key's corresponding value could be an array object containing all the points where the rhino was observed. You can retrieve any value based on its key, and you can also check whether a key exists using a simple if key in dictionary: check.

      We have not worked with dictionaries much in this course, but your Zandbergen text has an excellent section about them and there are abundant Python dictionary examples on the Internet.

      You can alternatively use lists to keep track of the information, but this will probably take more code. Using dictionaries I was able to write this script in under 60 lines (including comments and whitespace). If you find yourself getting confused or writing a lot of code with lists, you may try to switch to dictionaries.

    • To create your shapefile programmatically, use the CreateFeatureClass tool. The ArcGIS Desktop Help has several examples of how to use this tool. If you can't figure this part out, I suggest you create the feature class manually and work on writing the rest of the script. You can then return to this part at the end if you have time.

    • In order to get the shapefile in WGS 1984, you'll need to create a spatial reference object that you can assign to the shapefile at the time you create it. I recommend using the SpatialReference.CreateFromFile() method and pointing at the appropriate .prj file in C:\Program Files\ArcGIS\Coordinate Systems\. Be warned that if you do not correctly apply the spatial reference, your polyline precision could be diluted.

    If you do things right, your polylines should look like this (points are included only for reference):

    Final rhino tracks

    Note: Although I have placed the data in an African context (who heard of rhinos wandering New York City?) it is completely fabricated and does not resemble the path of any actual rhino, living or dead. If you exhibit a stellar performance on this project, you may choose the option of having a rhino named after you in a future offering of this course!