Lesson 4: Practical Python for the GIS analyst

Lesson 4 Overview

Lesson 4 contains a variety of subjects to help you use Python more effectively as a GIS analyst. The sections of this lesson will reinforce what you've learned already, while introducing some new concepts that will help take your automation to the next level.

You'll learn now to modularize a section of code to make it usable in multiple places. You'll learn how to use new Python modules, such as os, to open and read files; then you'll transfer the information in those files into geographic datasets that can be read by ArcGIS. Finally, you'll learn how to use your operating system to automatically run Python scripts at any time of day.

Lesson 4 checklist

Lesson 4 explores some more advanced Python concepts, including reading and parsing text. To complete Lesson 4, do the following:

  1. One week into the lesson, submit your Final Project proposal to the instructors using the Final Project Proposal Drop Box in the Final Project lesson under the Modules tab in Canvas. For the exact due date, see the Calendar tab in Canvas.
  2. Work through the course lesson materials.
  3. Read Zandbergen chapters 7.6, 8.1 - 8.6, 10, and 12.1 - 12.5. In the online lesson pages I have inserted instructions about when it is most appropriate to read each of these chapters.
  4. Complete Project 4 and submit your zipped deliverables to the Project 4 drop box.
  5. Complete the Lesson 4 Quiz.

Do items 1 - 3 (including any of the practice exercises you want to attempt) during the first week of the lesson. You will need time during the second week of the lesson to concentrate on the project and the quiz.

Lesson objectives

By the end of this lesson you should:

  • Understand how to create and use functions and modules
  • Be able to read and parse text in Python (e.g. using the csv module)
  • Be able to create new geometries and insert them into feature classes using insert cursors
  • Understand how to automate tasks with scheduling and batch files
  • Know the basics of dealing with map documents in arcpy
  • Understand Python dictionaries
  • Be able to write ArcGIS scripts that create new features and feature classes, e.g. from information in text files

4.1 Functions and modules

One of the fundamentals of programming that we did not previously cover is functions. To start this lesson, we'll talk about functions and how you can use them to your benefit as you begin writing longer scripts.

A function contains one focused piece of functionality in a reusable section of code. The idea is that you write the function once, then use, or call, it throughout your code whenever you need to. You can put a group of related functions in a module so you can use them in many different scripts. When used appropriately, functions eliminate code repetition and make the main body of your script shorter and more readable.

Functions exist in many programming languages, and each has its way of defining a function. In Python, you define a function using the def statement. Each line in the function that follows the def is indented. Here's a simple function that reads the radius of a circle and reports the circle's approximate area. (Remember that the area is equal to pi [3.14159...] multiplied by the square [** 2] of the radius.)

>>> def findArea(radius):
... 	area = 3.14159 * radius ** 2
... 	return area
>>> findArea(3)

Notice from the above example that functions can take parameters, or arguments. When you call the above function, you supply the radius of the circle in parentheses. The function returns the area (notice the return statement, which is new to you).

Thus, to find the area of a circle with a radius of 3 inches, you could make the function call findArea(3) and get the return value 28.27431 (inches).

It's common to assign the returned value to a variable and use it later in your code. For example, you could add these lines in the Interactive Window:

>>> aLargerCircle = findArea(4)
>>> print aLargerCircle

Please click this link to take a close look at what happens when the findArea(...)  function is called and executed in this example using the code execution visualizion feature of pythontutor.com. In the browser window that opens, you will see the code in the top left. Clicking the "Forward" and "Back" buttons allows you to step through the code, while seeing what Python stores in memory at any given moment in the window in the top right.

  • After clicking the "Forward" button the first time, the definition of the function is read in and a function object is created that is globally accessible under the name findArea.
  • In the next step, the call of findArea(4) in line 5 is executed. This results in a new variable with the name of the function's only parameter, so radius, being created and assigned the value 4. This variable is a local variable that is only accessible from the code inside the body of the function though.
  • Next, the program execution jumps to the code in the function definition starting in line 1.
  • In step 5, line 2 is executed and another local variable with the name area is created and assigned the result of the computation in line 2 using the current value of the local variable radius, which is 4.
  • In the next step, the return statement in line 3 sets the return value of the function to the current value of local variable area (50.2654).
  • When pressing "Forward" again, the program execution jumps back to line 5 from where the function was called. Since we are now leaving the execution of the function body, all local variables of the function (radius and area) are discarded. The return value is used in place of the function call in line 5, in this case meaning that it is assigned to a new global variable called aLargerCircle now appearing in memory.
  • In the final step, the value assigned to this variable (50.2654) is printed out.

It is important to understand the mechanisms of (a) jumping from the call of the function (line 5) to the code of the function definition and back, and of (b) creating local variables for the parameter(s) and all new variables defined in the function body and how they are discarded again when the end of the function body has been reached. The return value is the only piece of information that remains and is given back from the execution of the function.

A function is not required to return any value. For example, you may have a function that takes the path of a text file as a parameter, reads the first line of the file, and prints that line to the Interactive Window. Since all the printing logic is performed inside the function, there is really no return value.

Neither is a function required to take a parameter. For example, you might write a function that retrieves or calculates some static value. Try this in the Interactive Window:

>>> def getCurrentPresident():
... 	return "Barack Obama"
>>> president = getCurrentPresident()
>>> print president
Barack Obama

The function getCurrentPresident() doesn't take any user-supplied parameters. Its only "purpose in life" is to return the name of the current president. It cannot be asked to do anything else.


You may be wondering what advantage you gain by putting the above getCurrentPresident() logic in a function. Why couldn't you just define a string currentPresident and set it equal to "Barack Obama?" The big reason is reusability.

Suppose you maintain 20 different scripts, each of which works with the name of the current President in some way. You know that the name of the current President will eventually change. Therefore, you could put this function in what's known as a module file and reference that file inside your 20 different scripts. When the name of the President changes, you don't have to open 20 scripts and change them. Instead, you just open the module file and make the change once.

You may remember that you've already worked with some of Python's built-in modules. The Hi Ho! Cherry O example in Lesson 2 imported the random module so that the script could generate a random number for the spinner result. This spared you the effort of writing or pasting any random number generating code into your script.

You've also probably gotten used to the pattern of importing the arcpy site package at the beginning of your scripts. A site package can contain numerous modules. In the case of arcpy, these modules include Esri functions for geoprocessing.

As you use Python in your GIS work, you'll probably write functions that are useful in many types of scripts. These functions might convert a coordinate from one projection to another, or create a polygon from a list of coordinates. These functions are perfect candidates for modules. If you ever want to improve on your code, you can make the change once in your module instead of finding each script where you duplicated the code.

Creating a module

To create a module, create a new script in PythonWin and save it with the standard .py extension; but instead of writing start-to-finish scripting logic, just write some functions. Here's what a simple module file might look like. This module only contains one function, which adds a set of points to a feature class given a Python list of coordinates.

# This module is saved as practiceModule1.py

# The function below creates points from a list of coordinates
#  Example list: [[-113,23][-120,36][-116,-2]]

def createPoints(coordinateList, featureClass):

    # Import arcpy and create an insert cursor  
    import arcpy

    with arcpy.da.InsertCursor(featureClass, ("SHAPE@",)) as rowInserter:

        # Loop through each coordinate in the list and make a point    
        for coordinate in coordinateList:
            point = arcpy.Point(coordinate[0],coordinate[1])

The above function createPoints could be useful in various scripts, so it's very appropriate for putting in a module. Notice that this script has to work with insert cursors and point objects, so it requires arcpy. It's legal to import a site package or module within a module.

Also notice that arcpy is imported within the function, not at the very top of the module like you are accustomed to seeing. This is done for performance reasons. You may add more functions to this module later that do not require arcpy. You should only do the work of importing arcpy when necessary, that is, if a function is called that requires it.

The arcpy site package is only available inside the scope of this function. If other functions in your practice module were called, the arcpy module would not be available to those functions. Scope applies also to variables that you create in this function, such as rowInserter. Scope can be further limited by loops that you put in your function. The variable point is only valid inside the for loop inside this particular function. If you tried to use it elsewhere, it would be out of scope and unavailable.

Using a module

So how could you use the above module in a script? Imagine that the module above is saved on its own as practiceModule1.py. Below is an example of a separate script that imports practiceModule1.

# This script is saved as add_my_points.py

# Import the module containing a function we want to call
import practiceModule1

# Define point list and shapefile to edit
myWorldLocations = [[-123.9,47.0],[-118.2,34.1],[-112.7,40.2],[-63.2,-38.7]]
myWorldFeatureClass = "c:\\Data\\WorldPoints.shp"

# Call the createPoints function from practiceModule1
practiceModule1.createPoints(myWorldLocations, myWorldFeatureClass)

The above script is simple and easy to read because you didn't have to include all the logic for creating the points. That is taken care of by the createPoints function in the module you imported, practiceModule1. Notice that to call a function from a module, you need to use the syntax module.function().


To reinforce the material in this section, read Zandbergen 12.1 - 12.5, which talks about creating Python functions and modules.


Before moving ahead, get some practice in PythonWin by trying to write the following functions. These functions are not graded, but the experience of writing them will help you in Project 4. Use the course forums to help each other.

  • A function that returns the perimeter of a square given the length of one side.
  • A function that takes a path to a feature class as a parameter and returns a Python list of the fields in that feature class. Practice calling the function and printing the list. However, do not print the list within the function.
  • A function that returns the Euclidean distance between any two coordinates. The coordinates can be supplied as parameters in the form (x1, y1, x2, y2). For example, if your coordinates were (312088, 60271) and (312606, 59468), your function call might look like this: findDistance(312088, 60271, 312606, 59468). Use the Pythagorean formula A ** 2 + B ** 2 = C ** 2. For an extra challenge, see if you can handle negative coordinates.

The best practice is to put your functions inside a module and see if you can successfully call them from a separate script. If you try to step through your code using the debugger, you'll notice that the debugger helpfully moves back and forth between the script and the module whenever you call a function in the module.

4.2 Python Dictionaries

In programming, we often want to store larger amounts of data that somehow belongs together inside a single variable. In lesson 2, you already learned about lists, which provide one option to do so. As long as available memory permits, you can store as many elements in a list as you wish and the append(...) method allows you to add more elements to an existing list.

Dictionaries are another data structure that allows for storing complex information in a single variable. While lists store elements in a simple sequence and the elements are then accessed based on their index in the sequence, the elements stored in a dictionary consist of key-value pairs and one always uses the key to retrieve the corresponding values from the dictionary. It works like in a real dictionary where you look up information (the stored value) under a particular keyword (the key).

Dictionaries can be useful to realize a mapping, for instance from English words to the corresponding words in Spanish. Here is how you can create such a dictionary for just the numbers from one to four:

>>> englishToSpanishDic = { "one": "uno", "two": "dos", "three": "tres", "four": "cuatro"  }

The curly brackets { } delimit the dictionary similarly to how squared brackets [ ] do for lists. Inside the dictionary, we have four key-value pairs separated by commas. The key and value for each pair are separated by a colon. The key appears on the left of the colon, while the value stored under the key appears on the right side of the colon.

We can now use the dictionary stored in variable englishToSpanishDic to look up the Spanish word for an English number, e.g.

>>> print englishToSpanishDic["two"]   

To retrieve some value stored in the dictionary, we here use the name of the variable followed by squared brackets containing the key under which the value is stored in the dictionary. If we use the same notation but on the left side of an assignment operator (=), we can add a new key-value pair to an existing dictionary:

>>> englishToSpanishDic["five"] = "cinco"    
>>> print englishToSpanishDic
{'four': 'cuatro', 'three': 'tres', 'five': 'cinco', 'two': 'dos', 'one': 'uno'}

We here added the value "cinco" appearing on the right side of the equal sign under the key "five" to the dictionary. If something would have already been stored under the key "five" in the dictionary, the stored value would have been overwritten. You may have noticed that the order of the elements of the dictionary in the output has changed but that doesn’t matter since we always access the elements in a dictionary via their key. If our dictionary would contain many more word pairs, we could use it to realize a very primitive translator that would go through an English text word-by-word and replace each word by the corresponding Spanish word retrieved from the dictionary. Admittedly, using this simple approach would probably result in pretty hilarious translations.

Now let’s use Python dictionaries to do something a bit more complex. Let’s simulate the process of creating a book index that lists the page numbers on which certain keywords occur. We want to start with an empty dictionary and then go through the book page-by-page. Whenever we encounter a word that we think is important enough to be listed in the index, we add it and the page number to the dictionary.

To create an empty dictionary in a variable called bookIndex, we use the notation with the curly brackets but nothing in between:

>>> bookIndex = {}
>>> print bookIndex

Now let’s say the first keyword we encounter in the imaginary programming book we are going through is the word "function" on page 2. We now want to store the page number 2 (value) under the keyword "function" (key) in the dictionary. But since keywords can appear on many pages, what we want to store as values in the dictionary are not individual numbers but lists of page numbers. Therefore, what we put into our dictionary is a list with the number 2 as its only element:

>>> bookIndex["function"] =  [2]
>>> print bookIndex
{'function': [2]} 

Next, we encounter the keyword "module" on page 3. So we add it to the dictionary in the same way:

>>> bookIndex["module"] =  [3]
>>> print bookIndex
{'function': [2], 'module': [3]}

So now our dictionary contains two key-value pairs and for each key it stores a list with just a single page number. Let’s say we next encounter the keyword “function” a second time, this time on page 5. Our code to add the additional page number to the list stored under the key “function” now needs to look a bit differently because we already have something stored for it in the dictionary and we do not want to overwrite that information. Instead, we retrieve the currently stored list of page numbers and add the new number to it with append(…):

>>> pages = bookIndex["function"] 
>>> pages.append(5)
>>> print bookIndex
{'function': [2, 5], 'module': [3]}
>>> print bookIndex["function"]
[2, 5]

Please note that we didn’t have to put the list of page numbers stored in variable pages back into the dictionary after adding the new page number. Both, variable pages and the dictionary refer to the same list such that appending the number changes both. Our dictionary now contains a list of two page numbers for the key “function” and still a list with just one page number for the key “module”. Surely you can imagine how we would build up a large dictionary for the entire book by continuing this process. Dictionaries can be used in concert with a for loop to go through the keys of the elements in the dictionary. This can be used to print out the content of an entire dictionary:

>>> for k in bookIndex:  # loop through keys of the dictionary
...    print "keyword: " + k                # print the key
...    print "pages: " + str(bookIndex[k])  # print the value
keyword: function
pages: [2, 5]
keyword: module
pages: [3]

When adding the second page number for “function”, we ourselves decided that this needs to be handled differently than when adding the first page number. But how could this be realized in code? We can check whether something is already stored under a key in a dictionary using an if-statement together with the “in” operator:

>>> keyword = "function"
>>> if keyword in bookIndex: 
...    print "entry exists"
... else:
...    print "entry does not exist" 
entry exists

So assuming we have the current keyword stored in variable word and the corresponding page number stored in variable pageNo, the following piece of code would decide by itself how to add the new page number to the dictionary:

word = "module"
pageNo = 7

if word in bookIndex:
	# entry for word already exists, so we just add page
	pages = bookIndex[word]	
	# no entry for word exists, so we add new entry
	bookIndex[word] = [pageNo] 

A more sophisticated version of this code would also check whether the list of page numbers retrieved in the if-block already contains the new page number to deal with the case that a keyword occurs more than once on the same page. Feel free to think about how this could be included.


Read Zandbergen section 6.8 for more information and examples using Python dictionaries.

4.3 Reading and parsing text using the Python csv module

One of the best ways to increase your effectiveness as a GIS programmer is to learn how to manipulate text-based information. In Lesson 3, we talked about how to read data in ArcGIS's native formats, such as feature classes. But often GIS data is collected and shared in more "raw" formats such as a spreadsheet in CSV (comma-separated value) format, a list of coordinates in a text file, or an XML response received through a Web service.

When faced with these files, you should first understand if your GIS software already comes with a tool or script that can read or convert the data to a format it can use. If no tool or script exists, you'll need to do some programmatic work to read the file and separate out the pieces of text that you really need. This is called parsing the text.

For example, a Web service may return you many lines of XML describing all the readings at a weather station, when all you're really interested in are the coordinates of the weather station and the annual average temperature. Parsing the response involves writing some code to read through the lines and tags in the XML and isolating only those three values.

There are several different approaches to parsing. Usually the wisest is to see if some Python module exists that will examine the text for you and turn it into an object that you can then work with. In this lesson, you will work with the Python "csv" module that can read comma-delimited values and turn them into a Python list. Other helpful libraries such as this include lxml and xml.dom for parsing XML, and BeautifulSoup for parsing HTML.

If a module or library doesn't exist that fits your parsing needs, then you'll have to extract the information from the text yourself using Python's string manipulation methods. One of the most helpful ones is string.split(), which turns a big string into a list of littler strings based on some delimiting character, such as a space or comma. When you write your own parser, however, it's hard to anticipate all the exceptional cases you might run across. For example, sometimes a comma-separate value file might have substrings that naturally contain commas, such as dates or addresses. In these cases, splitting the string using a simple comma as the delimiter is not sufficient and you need to add extra logic.

Another pitfall when parsing is the use of "magic numbers" to slice off a particular number of characters in a string, to refer to a specific column number in a spreadsheet, and so on. If the structure of the data changes, or if the script is applied to data with a slightly different structure, the code could be rendered inoperable and would require some precision surgery to fix. People who read your code and see a number other than 0 (to begin a series) or 1 (to increment a counter) will often be left wondering how the number was derived and what it refers to. In programming, numbers other than 0 or 1 are magic numbers that should typically be avoided, or at least accompanied by a comment explaining what the number refers to.

There are an infinite number of parsing scenarios that you can encounter. This lesson will attempt to teach you the general approach by walking through just one module and example. In your final project for this course, you may choose to explore parsing other types of files.

The Python csv module

A common text-based data interchange format is the comma-separated value (CSV) file. This is often used when transferring spreadsheets or other tabular data. Each line in the file represents a row of the dataset, and the columns in the data are separated by commas. The file often begins with a header line containing all the field names.

Spreadsheet programs like Microsoft Excel can understand the CSV structure and display all the values in a row-column grid. A CSV file may look a little messier when you open it in a text editor, but it can be helpful to always continue thinking of it as a grid structure. If you had a Python list of rows and a Python list of column values for each row, you could use looping logic to pull out any value you needed. This is exactly what the Python csv module gives you. 

It's easiest to learn about the csv module by looking at a real example. The scenario below shows how the csv module can be used to parse information out of a GPS track file.

Introducing the GPS track parsing example

This example reads a text file collected from a GPS unit. The lines in the file represent readings taken from the GPS unit as the user traveled along a path. In this section of the lesson, you'll learn one way to parse out the coordinates from each reading. The next section of the lesson uses a variation of this example to show how you could write the user's track to a polyline feature class.

The file for this example is called gps_track.txt and it looks something like the text string shown below.  (Please note, line breaks have been added to the file shown below to ensure that the text fits within the page margins. Click on this link to the gps track.txt file to see what the text file actually looks like.)

TRACK,ACTIVE LOG,40.78966141,-77.85948515,4627251.76270444,1779451.21349775,True,False,
    255,358.228393554688,0,0,2008/06/11-14:08:30,eTrex Venture, ,2008/06/11 09:08:30
TRACK,ACTIVE LOG,40.78963995,-77.85954952,4627248.40489401,1779446.18060893,False,False,
    255,358.228393554688,0,0,2008/06/11-14:09:43,eTrex Venture, ,2008/06/11 09:09:43
TRACK,ACTIVE LOG,40.78961849,-77.85957098,4627245.69008772,1779444.78476531,False,False,
    255,357.747802734375,0,0,2008/06/11-14:09:44,eTrex Venture, ,2008/06/11 09:09:44
TRACK,ACTIVE LOG,40.78953266,-77.85965681,4627234.83213242,1779439.20202706,False,False,
    255,353.421875,0,0,2008/06/11-14:10:18,eTrex Venture, ,2008/06/11 09:10:18
TRACK,ACTIVE LOG,40.78957558,-77.85972118,4627238.65402635,1779432.89982442,False,False,
    255,356.786376953125,0,0,2008/06/11-14:11:57,eTrex Venture, ,2008/06/11 09:11:57
TRACK,ACTIVE LOG,40.78968287,-77.85976410,4627249.97592111,1779427.14663093,False,False,
    255,354.383178710938,0,0,2008/06/11-14:12:18,eTrex Venture, ,2008/06/11 09:12:18
TRACK,ACTIVE LOG,40.78979015,-77.85961390,4627264.19055204,1779437.76243578,False,False,
    255,351.499145507813,0,0,2008/06/11-14:12:50,eTrex Venture, ,2008/06/11 09:12:50
etc. ...

Notice that the file starts with a header line, explaining the meaning of the values contained in the readings from the GPS unit. Each subsequent line contains one reading. The goal for this example is to create a Python list containing the X,Y coordinates from each reading. Specifically, the script should be able to read the above file and print a text string like the one shown below.

[['-77.85948515', '40.78966141'], ['-77.85954952', '40.78963995'], ['-77.85957098', '40.78961849'], etc.]

Approach for parsing the GPS track

Before you start parsing a file, it's helpful to outline what you're going to do and break up the task into manageable chunks. Here's some pseudocode for the approach we'll take in this example:

  1. Open the file.
  2. Read the header line.
  3. Loop through the header line to find the index positions of the "lat" and "long" values.
  4. Read the rest of the lines one by one.
  5. Find the values in the list that correspond to the lat and long coordinates and write them to a new list.

Importing the module

When you work with the csv module, you need to explicitly import it at the top of your script, just like you do with arcpy.

import csv

You don't have to install anything special to get the csv module; it just comes with the base Python installation.

Opening the file and creating the CSV reader

The first thing the script needs to do is open the file. Python contains a built-in open() method for doing this. The parameters for this method are the path to the file and the mode in which you want to open the file (read, write, etc.). In this example, "r" stands for read-only mode. If you wanted to write items to the file, you would use "w" as the mode.

gpsTrack = open("C:\\data\\Geog485\\gps_track.txt", "r")

Notice that your file does not need to have the extension .csv in order to be read by the CSV module. It can be suffixed .txt as long as the text in the file conforms to the CSV pattern where commas separate the columns and a carriage returns separate the rows. Once the file is open, you create a CSV reader object, in this manner:

csvReader = csv.reader(gpsTrack)

This object is kind of like a cursor. You can use the next() method to go to the next line, but you can also use it with a for loop to iterate through all the lines of the file.

Reading the header line

The header line of a CSV file is different from the other lines. It gets you the information about all the field names. Therefore, you will examine this line a little differently than the other lines. First, you advance the CSV reader to the header line by using the next() method, like this:

header = csvReader.next()

This gives you back a Python list of each item in the header. Remember that the header was a pretty long string beginning with: "type,ident,lat,long...". The CSV reader breaks the header up into a list of parts that can be referenced by an index number. The default delimiter, or separating character, for these parts is the comma. Therefore, header[0] would have the value "type", header[1] would have the value "ident", and so on.

We are most interested in pulling latitude and longitude values out of this file, therefore, we're going to have to take note of the position of the "lat" and "long" columns in this file. Using the logic above, you would use header[2] to get "lat" and header[3] to get "long". However, what if you got some other file where these field names were all in a different order? You could not be sure that the column with index 2 represented "lat" and so on.

A safer way to parse is to use the list.index() method and ask the list to give you the index position corresponding to a particular field name, like this:

latIndex = header.index("lat")
lonIndex = header.index("long")

In our case, latIndex would have a value of 2 and lonIndex would have a value of 3, but our code is now flexible enough to handle those columns in other positions.

Processing the rest of the lines in the file

The rest of the file can be read using a loop. In this case, you treat the csvReader as an iterable list of the remaining lines in the file. Each run of the loop takes a row and breaks it into a Python list of values. If we get the value with index 2 (represented by the variable latIndex), then we have the latitude. If we get the value with index 3 (represented by the variable lonIndex), then we get the longitude. Once we get these values, we can add them to a list we made, called coordList:

# Make an empty list
coordList = []

# Loop through the lines in the file and get each coordinate
for row in csvReader:
    lat = row[latIndex]
    lon = row[lonIndex]

# Print the coordinate list
print coordList

Note a few important things about the above code:

  • coordList actually contains a bunch of small lists within a big list. Each small list is a coordinate pair representing the x (longitude) and y (latitude) location of one GPS reading.
  • The list.append() method is used to add items to coordList. Notice again that you can append a list itself (representing the coordinate pair) using this method.

Full code for the example

Here's the full code for the example. Feel free to download the text file and try it out on your computer.

# This script reads a GPS track in CSV format and
#  prints a list of coordinate pairs
import csv

# Set up input and output variables for the script
gpsTrack = open("C:\\data\\Geog485\\gps_track.txt", "r")

# Set up CSV reader and process the header
csvReader = csv.reader(gpsTrack)
header = csvReader.next()
latIndex = header.index("lat")
lonIndex = header.index("long")

# Make an empty list
coordList = []

# Loop through the lines in the file and get each coordinate
for row in csvReader:
    lat = row[latIndex]
    lon = row[lonIndex]

# Print the coordinate list
print coordList

Applications of this script

You might be asking at this point, "What good does this list of coordinates do for me?" Admittedly, the data is still very "raw." It cannot be read directly in this state by a GIS. However, having the coordinates in a Python list makes them easy to get into other formats that can be visualized. For example, these coordinates could be written to points in a feature class, or vertices in a polyline or polygon feature class. The list of points could also be sent to a Web service for reverse geocoding, or finding the address associated with each point. The points could also be plotted on top of a Web map using programming tools like the Google Maps API. Or, if you were feeling really ambitious, you might use Python to write a new file in KML format, which could be viewed in 3D in Google Earth.


Parsing any piece of text requires you to be familiar with file opening and reading methods, the structure of the text you're going to parse, the available parsing modules that fit your text structure, and string manipulation methods. In the preceding example, we parsed a simple text file, extracting coordinates collected by a handheld GPS unit. We used the csv module to break up each GPS reading and find the latitude and longitude values. In the next section of the lesson, you'll learn how you could do more with this information by writing the coordinates to a polyline dataset.

As you use Python in your GIS work, you could encounter a variety of parsing tasks. As you approach these, don't be afraid to seek help from Internet examples, code reference topics such as the ones linked to in this lesson, and your textbook.


It's worth your time to read Zandbergen 7.6, which talks about parsing text files. Any examples you can pick up with text parsing will help you when you encounter a new file that you need to read. You'll have this experience in the practice exercises and projects this week.


4.4 Writing geometries

As you parse out geographic information from "raw" sources such as text files, you may want to convert it to a format that is native to your GIS. This section of the lesson discusses how to write vector geometries to ArcGIS feature classes. We'll read through the same GPS-produced text file from the previous section, but this time we'll add the extra step of writing each coordinate to a polyline shapefile.

You've already had some experience writing point geometries when we learned about insert cursors. To review, you use arcpy.Point() to create a Point object and then use this object in the tuple given to insertRow() for the geometry field referred to as "SHAPE@" (see 4.1).

# Create point
inPoint = arcpy.Point(-121.34, 47.1)

# Create new row

For polylines and polygons, you create multiple Point objects that you add to an Array object. Then you make a Polyline or Polygon object using the array. With polygons it's a good practice to make the end vertex the same as the start vertex if possible.

The code below creates an empty array and adds three points using the Array.add() method. Then the array is used to create a Polyline object.

The first parameter you pass in when creating a polyline is the array containing the points for the polyline. The second parameter is a spatial reference of the coordinates, which you should always pass in to ensure that the precision of your data is maintained.

# Make a new empty array
array = arcpy.Array()

# Make some points
point1 = arcpy.Point(-121.34,47.1)
point2 = arcpy.Point(-121.29,47.32)
point3 = arcpy.Point(-121.31,47.02)

# Put the points in the array

# Make a polyline out of the now-complete array
polyline = arcpy.Polyline(array, spatialRef)

Of course, you usually won't create points manually in your code like this with hard-coded coordinates. It's more likely that you'll parse out the coordinates from a file or capture them from some external source, such as a series of mouse clicks on the screen.

Creating a polyline from a GPS track

Here's how you could parse out coordinates from a GPS-created text file like the one in the previous section of the lesson. This code reads all the points captured by the GPS and adds them to one long polyline. The polyline is then written to an empty, pre-existing polyline shapefile with a geographic coordinate system named tracklines.shp. If you didn't have a shapefile already on disk, you could use the Create Feature Class tool to create one with your script.

# This script reads a GPS track in CSV format and
#  writes geometries from the list of coordinate pairs
import csv
import arcpy

# Set up input and output variables for the script
gpsTrack = open("C:\\data\\Geog485\\gps_track.txt", "r")
polylineFC = "C:\\data\\Geog485\\tracklines.shp"
spatialRef = arcpy.Describe(polylineFC).spatialReference

# Set up CSV reader and process the header
csvReader = csv.reader(gpsTrack)
header = csvReader.next()
latIndex = header.index("lat")
lonIndex = header.index("long")

# Create an empty array object
vertexArray = arcpy.Array()

# Loop through the lines in the file and get each coordinate
for row in csvReader:
    lat = row[latIndex]
    lon = row[lonIndex]

    # Make a point from the coordinate and add it to the array
    vertex = arcpy.Point(lon,lat)

# Write the array to the feature class as a polyline feature
with arcpy.da.InsertCursor(polylineFC, ("SHAPE@",)) as cursor:
    polyline = arcpy.Polyline(vertexArray, spatialRef)

The above script starts out the same as the one in the previous section of the lesson. First, it parses the header line of the file to determine the position of the latitude and longitude coordinates in each reading. But then, notice that an array is created to hold the points for the polyline:

vertexArray = arcpy.Array()

After that, a loop is initiated that reads each line and creates a point object from the latitude and longitude values. At the end of the loop, the point is added to the array.

for row in csvReader:
    lat = row[latIndex]
    lon = row[lonIndex]

    # Make a point from the coordinate and add it to the array
    vertex = arcpy.Point(lon,lat)

Once all the lines have been read, the loop exits and an insert cursor is created using "SHAPE@" as the only element in the tuple of affected fields. Then a Polyline object is created and used as the first (and only) element of the tuple given to insertRow():

# Create an insert cursor
with arcpy.da.InsertCursor(polylineFC, ("SHAPE@",)) as cursor:

    # Put the array in a polyline and write it to the feature class
    polyline = arcpy.Polyline(vertexArray, spatialRef)


Remember that the cursor places a lock on your dataset, so this script doesn't create the cursor until absolutely necessary (in other words, after the loop).

Extending the example for multiple polylines

Just for fun, suppose your GPS allows you to mark the start and stop of different tracks. How would you handle this in the code? You can download this modified text file with multiple tracks if you want to try out the following example.

Notice that in the GPS text file, there is an entry new_seg:


new_seg is a boolean property that determines whether the reading begins a new track. If new_seg = true, you need to write the existing polyline to the shapefile and start creating a new one. Take a close look at this code example and notice how it differs from the previous one in order to handle multiple polylines:

# This script reads a GPS track in CSV format and
#  writes geometries from the list of coordinate pairs
#  Handles multiple polylines

# Function to add a polyline
def addPolyline(cursor, array, sr):
    polyline = arcpy.Polyline(array, sr)

# Main script body
import csv
import arcpy

# Set up input and output variables for the script
gpsTrack = open("C:\\data\\Geog485\\gps_track_multiple.txt", "r")
polylineFC = "C:\\data\\Geog485\\tracklines_sept25.shp"
spatialRef = arcpy.Describe(polylineFC).spatialReference

# Set up CSV reader and process the header
csvReader = csv.reader(gpsTrack)
header = csvReader.next()
latIndex = header.index("lat")
lonIndex = header.index("long")
newIndex = header.index("new_seg")

# Write the array to the feature class as a polyline feature
with arcpy.da.InsertCursor(polylineFC, ("SHAPE@",)) as cursor:

    # Create an empty array object
    vertexArray = arcpy.Array()

    # Loop through the lines in the file and get each coordinate
    for row in csvReader:
        isNew = row[newIndex].upper()

        # If about to start a new line, add the completed line to the
        #  feature class
        if isNew == "TRUE":
            if vertexArray.count > 0:
                addPolyline(cursor, vertexArray, spatialRef)

        # Get the lat/lon values of the current GPS reading
        lat = row[latIndex]
        lon = row[lonIndex]

        # Make a point from the coordinate and add it to the array
        vertex = arcpy.Point(lon,lat)

    # Add the final polyline to the shapefile
    addPolyline(cursor, vertexArray, spatialRef)

The first thing you should notice is that this script uses a function. The addPolyline function adds a polyline to a feature class, given three parameters: (1) an existing insert cursor, (2) an array, and (3) a spatial reference. This function cuts down on repeated code and makes the script more readable.

Here's a look at the addPolyline function:

# Function to add a polyline
def addPolyline(cursor, array, sr):
    polyline = arcpy.Polyline(array, sr)

Notice it's okay to use arcpy in the above function, since it is going inside the body of a script that imports arcpy. However, you want to avoid using variables in the function that are not defined within the function or passed in as parameters.

The addPolyline function is called twice in the script: once within the loop, which we would expect, and once at the end to make sure the final polyline is added to the shapefile. This is where writing a function cuts down on repeated code.

As you read each line of the text file, how do you determine whether it begins a new track? First of all, notice that we've added one more value to look for in this script:

  newIndex = header.index("new_seg")

The variable newTrackIndex shows us which position in the line is held by the boolean new_seg property that tells us whether a new polyline is beginning. If you have sharp eyes, you'll notice we check for this later in the code:

  isNew = row[newIndex].upper()

  # If about to start a new line, add the completed line to the
  #  feature class
  if isNew == "TRUE":

In the above code, the upper() method converts the string into all upper-case, so we don't have to worry about whether the line says "true," "True," or "TRUE." But there's another situation we have to handle: What about the first line of the file? This line should read "true," but we can't add the existing polyline to the file at that time, because there isn't one yet. Notice that a second check is performed to make sure there are more than zero points in the array before the array is written to the shapefile:

  if vertexArray.count > 0:
      addPolyline(cursor, vertexArray, spatialRef)

The above code checks to make sure there's at least one point in the array, then it calls the addPolyline function, passing in the cursor and the array.

Here's another question to consider: How did we know that the Array object has a count property that tells us how many items are in it? This comes from the ArcGIS Desktop Help topic describing the Array class. In this section of the help there are topics describing each class in arcpy, and you'll come here often if you work with ArcGIS geometries in Python.

In the above-linked Array topic, find the Properties table in this topic and notice that Array has a read-only count property. If we were working with a Python list, we could use len(vertexArray), but in our case vertexArray is an Array object that is native to the ArcGIS geoprocessing programming model. This means it is a specialized object designed by Esri, and you can only learn its methods and properties by examining the documentation. Bookmark these pages!


You can write geometries to ArcGIS feature classes using a combination of geometry objects included with ArcGIS. The common workflow is to create Point objects, which you add to an Array object. You can use the Array object plus a spatial reference to create Polyline and Polygon objects. You then use an insert cursor to assign the geometry in the array to the feature class's geometry field (usually called "shape").

You may be wondering how you might create a multi-part feature (such as the state of Hawaii containing multiple islands), or a polygon with a "hole" in it. There are special rules for ordering and nesting Points and Arrays to create these types of geometries. These are covered in the course textbook, which brings us to...


Read Zandbergen 8.1 - 8.6, which contains a good summary of how to read and write Esri geometries.


4.5 Automation with batch files and scheduled tasks

In this course, we've talked about the benefits of automating your work through Python scripts. It's nice to be able to run several geoprocessing tools in a row without manually traversing the Esri toolboxes, but what's so automatic about launching PythonWin, opening your script, and clicking the Run button? In this section of the lesson, we'll take automation one step further by discussing how you can make your scripts run automatically.

Scripts and your operating system

Most of the time we've run scripts in this course, it's been through PythonWin. Your operating system (Windows) can run scripts directly. Maybe you've tried to double-click a .py file to run a script. As long as Windows understands that .py files represent a Python script and that it should use the Python interpreter to run the script, the script will launch immediately.

When you try to launch a script automatically by double-clicking it, it's possible you'll get a message saying Windows doesn't know which program to use to open your file. If this happens to you, use the Browse button on the error dialog box to browse to the Python executable, most likely located in C:\Python27\ArcGIS10.3\Python.exe. Make sure "Always use the selected program to open this kind of file" is checked and click OK. Windows now understands that .py files should be run using Python.

Double-clicking a .py file gives your operating system the simple command to run that Python script. You can alternatively tell your operating system to run a script using the Windows command line interface. This environment just gives you a blank window with a blinking cursor and allows you to type the path to a script or program, followed by a list of parameters. It's a clean, minimalist way to run a script. In Windows XP, you can open the command line by clicking Start > Run and typing cmd. In Windows Vista or Windows 7, just type cmd in the Search box.

The command line

Advanced use of the command line is outside the scope of this course. For now, it's sufficient to say that you can run a script from the command line by typing the path of the Python executable, followed by the full path to the script, like this:

C:\Python27\ArcGIS10.3\Python.exe C:\WCGIS\Geog485\Lesson1\Project1.py

If the script takes parameters, you must also type each argument separated by a space. Remember that arguments are the values you supply for the script's parameters. Here's an example of a command that runs a script with two arguments, both strings that represent pathnames. Notice that you should use the regular \ in your paths when providing arguments from the command line (not / or \\ as you would use in PythonWin).

C:\Python27\ArcGIS10.3\Python.exe C:\WCGIS\Geog485\Lesson2\Project2.py C:\WCGIS\Geog485\Lesson2\ C:\WCGIS\Geog485\Lesson2\CityBoundaries.shp

If the script executes successfully, you often won't see anything except a new command prompt (remember, this is minimalist!). If your script is designed to print a message, you should see the message. If your script is designed to modify files or data, you can check those files or data (perhaps using ArcCatalog) to make sure the script ran correctly.

You'll also see messages if your script fails. Sometimes these are the same messages you would see in the Python Interactive Window. At other times, the messages are more helpful than what you would see in PythonWin, making the command line another useful tool for debugging. Unfortunately, at some times the messages are less helpful.

Batch files

Why is the command line so important in a discussion about automation? After all, it still takes work to open the command line and type the commands. The beautiful thing about commands is that they, too, can be scripted. You can list multiple commands in a simple text-based file, called a batch file. Running the batch file runs all the commands in it.

Here's an example of a simple batch file that runs the two scripts above. To make this batch file, you could put the text below inside an empty Notepad file and save it with a .bat extension. Remember that this is not Python; it's command syntax:

REM Runs both my project scripts

C:\Python27\ArcGIS10.3\Python.exe C:\WCGIS\Geog485\Lesson1\Project1.py
ECHO Ran project 1
C:\Python27\ArcGIS10.3\Python.exe C:\WCGIS\Geog485\Lesson2\Project2.py C:\WCGIS\Geog485\Lesson2\ C:\WCGIS\Geog485\Lesson2\CityBoundaries.shp
ECHO Ran project 2

Here are some notes about the above batch file, starting from the top:

  • @ECHO OFF prevents all the lines in your batch file from being printed to the command line window, or console, when you run the file. It's standard procedure to use this as the first line of your batch file, unless you really want to see which line of the file is executing (perhaps for debugging purposes).
  • REM is how you put a comment in your batch file, the same way # denotes a comment in Python.
  • You put commands in your batch file using the same syntax you used from the command line.
  • ECHO prints something to the console. This can be useful for debugging, especially when you've used @ECHO OFF because you don't want to see every line of your batch file printed to the console.
  • PAUSE gives a "Press any key to continue..." prompt. If you don't put this at the end of your batch file, the console will immediately close after the file is done executing. When you're writing and debugging the batch file, it's useful to put PAUSE at the end so you can see any error messages that were printed when running the file. Once your batch file is tested and working correctly, you can remove the PAUSE.

Batch files can contain variables, loops, comments, and conditional logic, all of which are beyond the scope of this lesson. However, if you'll be writing and running many scripts for your organization, it's worthwhile to spend some time learning more about batch files. Fortunately, batch files have been around for a long time (they are older than Windows itself), so there's an abundance of good information available on the Internet to help you.

Scheduling tasks

At this point we've come pretty close to reaching true automation, but there's still that need to launch the Python script or the batch file, either by double-clicking it, invoking it from the command line, or otherwise telling the operating system to run it. To truly automate the running of scripts and batch files, you can use an operating system utility such as Windows Task Scheduler.

Task Scheduler is one of those items hidden in Windows System Tools that you may not have paid any attention to before. It's a relatively simple program that allows you to schedule your scripts and batch files to run on a regular basis. This is helpful if the task needs to run often enough that it would be burdensome to launch the batch file manually; but it's even more helpful if the task takes some of your computing resources and you want to run it during the night or weekend to minimize impact on others who may be using the computer.

Here's a real-world scenario where Task Scheduler (or a comparable utility if you're running on a Mac, Linux, or UNIX) is very important: Fast Web maps tend to use a server-side cache of pregenerated map images, or tiles, so that the server doesn't have to draw the map each time someone navigates to an area. A Web map administrator who has ArcGIS Server can run the tool Manage Map Server Cache Tiles to make the tiles before he or she deploys the Web map. After deployment, the server quickly sends the appropriate tiles to people as they navigate the Web map. So far so good.

As the source GIS data for the map changes, however, the cache tiles become out of date. They are just images and do not know how to update themselves automatically. The cache needs to be updated periodically, but cache tile creation is a time consuming and CPU-intensive operation. For this reason, many server administrators use Task Scheduler to update the cache. This usually involves writing a script or batch file that runs Manage Map Server Cache Tiles and other caching tools, then scheduling that script to run on nights or weekends when it would be least disruptive to users of the Web map.

Inside Windows Task Scheduler

Let's take a quick look inside Windows Task Scheduler. The instructions below are for Windows Vista (and probably Windows 7). Other versions of Windows have a very similar Task Scheduler, and with some adaptation you can also use the instructions below to understand how to schedule a task.

  1. Open Task Scheduler by navigating the Windows Start menu to All Programs > Accessories > System Tools > Task Scheduler.
  2. Click Create Basic Task. This walks you through a simple wizard to set up the task. You can configure advanced options on the task later.
  3. Give your task a Name that will be easily remembered and optionally, a Description. Then click Next.
  4. Choose how often you want the task to run. For this example, choose Daily. Then click Next.
  5. Choose a Start time and a recurrence frequency. If you want, choose a time a few minutes ahead of the current time, so you can see what it looks like when a task runs. Then click Next.
  6. Choose Start a program, then click Next.
  7. Here's the moment of truth where you specify which script or batch file you want to run. Click Browse and navigate to one of the Python scripts you've written during this course. It's going to be easiest here if you pick a script that doesn't take any arguments, such as your project 1 script that makes contour lines from hard-coded datasets, but if you are feeling brave you can also add arguments in this panel of the wizard. Then click Next.
  8. Review the information about your task, then click Finish.
  9. Notice that your task now appears in the list in Task Scheduler. You can highlight the task to see its properties, or right-click the task and click Properties to actually set those properties. You can use the advanced properties to get your script to run even more frequently than daily, for example, every 15 minutes.
  10. Wait for your scheduled time to occur, or if you don't want to wait, right-click the task and click Run. Either way, you'll see a console window appear when the script begins and disappear once the script has finished. (If you're running a Python script and you don't want the console window to disappear at the end, you can put a line at the end of the script such as lastline = raw_input(">"). This stops the script until the user presses Enter. Once you're comfortable with the script running on a regular basis, you'll probably want to remove this line to keep open console windows from cluttering your screen. After all, the idea of a scheduled task is that it happens in the background without bothering you.)
    Screen capture of the Windows Task Scheduler
    Figure 4.1 The Windows Task Scheduler.


To make your scripts run automatically, you use Windows Task Scheduler to create a task that the operating system runs at regular intervals. The task can point at either a .py file (for a single script), or a .bat file (for multiple scripts). Using scheduled tasks, you can achieve full automation of your GIS processes.

4.6 Running any tool in the box

Sooner or later, you're going to have to include a geoprocessing tool in your script that you have never run before. It's possible that you've never even heard of the tool or run it from its GUI, let alone a script.

In other cases, you may know the tool very well, but your Python may be rusty, or you may not be sure how to construct all the necessary parameters.

The approach for both of these situations is the same. Here are some suggested steps for running any tool in the ArcGIS toolboxes using Python:

  1. Find the tool reference documentation. We've seen this already during the course. Each tool has its own topic in the Geoprocessing tool reference section of the ArcGIS Help. Open that topic and read it before you do anything else. Read the "Usage" section at the beginning to make sure that it's the right tool for you and that you are about to employ it correctly.
  2. Examine the parameters. Scroll down to the "Syntax" section of the topic and read which parameters the tool accepts. Note which parameters are required and which are optional, and decide which parameters your script is going to supply.
  3. In your Python script, create variables for each parameter. Note that each parameter in the "Syntax" section of the topic has a data type listed. If the data type for a certain parameter is listed as "String," you need to create a Python string variable for that parameter.

    Sometimes the translation from data type to Python variable is not direct. For example, sometimes the tool reference will say that the required variable is a "Feature Class." What this really means for your Python script is that you need to create a string variable containing the path to a feature class.

    Another example is if the tool reference says that the required data type is a "Long." What this means in Python is that you need to create a numerical variable (as opposed to a string) for that particular parameter.

    If you have doubts about how to create your variable to match the required data type, scroll down to the "Code Sample" in the tool reference topic. Try to find the place where the example script defines the variable you're having trouble with. Copy the patterns that you see in the example script and usually you'll be okay.

    Most of the commonly used tools have excellent example scripts, but others are hit or miss. If your tool of interest doesn't have a good example script, you may be able to find something on the Esri forums or a well-phrased Google search.

  4. Run the tool...with error handling. You can run your script without try/except blocks to catch any basic errors in the Interactive Window. If you're still not getting anything helpful, a next resort is to add the try/except blocks and put print arcpy.GetMessages() in the except block.

In Project 4 you'll get a chance to practice these skills to run a tool you previously haven't worked with in a script.

4.7 Working with map documents

To this point, we've talked about automating geoprocessing tools, updating GIS data, and reading text files. However, we've not covered anything about working with an Esri map document. There are many tasks that can be performed on a map document that are well-suited for automation. These include:

  • Finding and replacing text in a map or series of maps. For example, a copyright notice for 2015 becomes 2016.
  • Repairing layers that are referencing data sources using the wrong paths. For example, your map was sitting on a computer where all the data was in C:\data and now it is on a computer where all the data is in D:\myfolder\mydata.
  • Printing a series of maps or data frames.
  • Exporting a series of maps to PDF and joining them to create a "map book."
  • Making a series of maps available to others on ArcGIS Server.

Esri map documents are binary files, meaning they can't be easily read and parsed using the techniques we covered earlier in this lesson. Until very recently the only way to automate anything with a map document was to use ArcObjects, which is somewhat challenging for beginners and requires using a language other than Python. With the release of ArcGIS 10.0, Esri added a Python module for automating common tasks with map documents.

The arcpy.mapping module

arcpy.mapping is a module you can use in your scripts to work with map documents. Please take a detour at this point to read the Esri Introduction to arcpy.mapping.

The most important object in this module is MapDocument. This tells your script which map you'll be working with. You can get a MapDocument by referencing a path, like this:

mxd = arcpy.mapping.MapDocument(r"C:\data\Alabama\UtilityNetwork.mxd")

Notice the use of r in the line above to denote a string literal. In other words, if you include r right before you begin you're string, it's safe to use reserved characters like the single backslash \. I've done it here because you'll see it in a lot of the Esri examples with arcpy.mapping.

Instead of directly using a string path, you could alternatively put a variable holding the path. This would be useful if you were iterating through all the map documents in a folder using a loop, or if you previously obtained the path in your script using something like arcpy.GetParameterAsText().

It can be convenient to work with arcpy.mapping in the Python window in ArcMap. In this case, you do not have to put the path to the MXD. There's a special keyword "CURRENT" that you can use to get a reference to the currently-open MXD.

mxd = arcpy.mapping.MapDocument("CURRENT")

Once you get a MapDocument, then you do something with it. Most of the functions in arcpy.mapping take a MapDocument object as a parameter. Let's look at this first script from the Esri help topic linked above and scrutinize what is going on. I've added comments to each line.

# Create a MapDocument object referencing the MXD you want to update
mxd = arcpy.mapping.MapDocument(r"C:\GIS\TownCenter_2015.mxd")

# Loop through each text element in the map document
for textElement in arcpy.mapping.ListLayoutElements(mxd, "TEXT_ELEMENT"):
    # Check if the text element contains the out of date text
    if textElement.text == "GIS Services Division 2015":
	# If out of date text is found, replace it with the new text
        textElement.text = "GIS Services Division 2016"
# Export the updated map to a PDF
arcpy.mapping.ExportToPDF(mxd, r"C:\GIS\TownCenterUpdate_2016.pdf")

# Clean up the MapDocument object by deleting it
del mxd

The first line in the above example gets a MapDocument object referencing C:\GIS\TownCenter_2014.mxd. The example then employs two functions from arcpy.mapping. The first is ListLayoutElements. Notice that the parameters for this function are a MapDocument and the type of layout element you want to get back, in this case, "TEXT_ELEMENT". (Examine the documentation for List Layout Elements to understand the other types of elements you can get back.)

The function returns a Python list of TextElement objects representing all the text elements in the map document. You know what to do if you want to manipulate every item in a Python list. In this case, the example uses a for loop to check the TextElement.text property of each element. This property is readable and writeable, meaning if you want to set some new text, you can do so by simply using the equals sign assignment operator as in textElement.text = "GIS Services Division 2016"

The ExportToPDF function is very simple in this script. It takes a MapDocument and the path of the output PDF as parameters. If you look at the documentation for ExportToPDF, you'll notice a lot of other optional parameters for exporting PDFs, such as whether to embed fonts, that are just left as defaults in this example.

Learning arcpy.mapping

The best way to learn arcpy.mapping is to try to use it. Because of its simple, "one-line-fix" nature, it's a good place to practice your Python. It's also a good way to get used to the Python window in ArcMap, because you can immediately see the results of your actions.

Although there is no arcpy.mapping component to this lesson's project, you're welcome to use it in your final project. If you've already submitted your final project proposal, you can amend it to use arcpy.mapping by e-mailing and obtaining approval from the instructors. If you use arcpy.mapping in your final project, you should attempt to incorporate several of the functions or mix it with other Python functionality you've learned, making something more complex than the "one line fix" type of script I mentioned above.

By now you'll probably have experienced the reality that your code does not always run as expected on the first try. Before you start running arcpy.mapping commands on your production MXDs, I suggest making backup copies.

Here are a few additional places where you can find excellent help on learning arcpy.mapping:

  • Zandbergen chapter 10. I recommend that you at least skim this chapter to see the types of examples that are included.
  • The Arcpy Mapping module book in the ArcGIS Desktop Help
  • Intro to arcpy.mapping technical workshop from the 2015 Esri User Conference


4.8 Limitations of Python scripting with ArcGIS

In this course you've learned the basics of programming and have seen how Python can automate any GIS function that can be performed with the ArcGIS toolboxes. There's a lot of power available to you through scripting, and hopefully you're starting to get ideas about how you can apply that in your work outside this course.

To conclude this lesson, however, it's important to talk about what's not available through Python scripting in ArcGIS.

Limits with fine-grained access to the "guts" of ArcGIS

At ArcGIS, Python interaction with ArcGIS is mainly limited to reading and writing data, editing the properties of map documents, and running the tools that are included with ArcGIS. Although the ArcGIS tools are useful, they are somewhat black box, meaning you put things in and get things out without knowing or being concerned about what is happening inside. If you want a greater degree of control over how ArcGIS is manipulating your data, you need to work with ArcObjects.

ArcObjects can be thought of as "the building blocks" of ArcGIS. In fact, an analogy with the children's Lego building bricks works well to describe ArcObjects: Programming with ArcObjects is akin to having an enormous selection of Legos of all shapes and sizes, whereas Python scripting is like working with a kit containing some large prefabricated pieces that make it much easier to construct a particular final product.

Because of the sheer amount of functionality and objects available to you, ArcObjects is more challenging to learn than simple Python scripting. Usually, an equivalent task takes many more lines of code to write in ArcObjects than in a Python script. However, when you use ArcObjects you have much greater control over what happens in your program. You can take a small piece of functionality and use it without the overhead of a tool or all the other parameters that come with a tool.

Limits with user interface customization prior to ArcGIS 10.1

In this course we have done nothing with customizing ArcMap to add special buttons, toolbars, and so on that trigger our programs. Our foray into user interface design has been limited to making a script tool and toolbox. Although script tools are useful, there are times when you want to take the functionality out of the toolbox and put it directly into ArcMap as a button on a toolbar. You may want that button to launch a new window with text boxes, labels, and buttons that you design yourself.

In ArcGIS 10.0 if you want to put custom functionality or programs directly into ArcMap, you need to use Visual Basic for Applications (VBA), C ++, or a .NET language (VB.NET or C#) working with ArcObjects. The functionality may be as simple as putting some custom actions behind a button (zoom to a certain bookmark, for example), or you may open a full-blown program you develop with multiple forms, options, and menus. The aforementioned languages have IDEs in which you can design custom user interfaces with text boxes, labels, buttons, and so on.

Geog 489, another elective course in the GIS certificate program, covers GIS customization using ArcObjects.

New Python add-in functionality at ArcGIS 10.1

To allow a greater degree of interactivity between the ArcMap user interface and Python scripts, ArcGIS 10.1 introduces the concept of a Python add-in. These allow you to attach Python logic to a limited set of actions you perform in ArcMap, such as zooming the map, opening a new map document, or clicking a button on a custom toolbar. For example, you might create an add-in that automatically adds a particular set of layers any time someone pushes a certain button on your toolbar.

With Python add-ins, you get access to a number of user interface elements to use as a front end to your Python scripts, including toolbars, buttons, menus, combo boxes, and basic file browsing and Yes/No confirmation dialog boxes. There's also a set of common events that you can detect and respond to in your code, such as the map opening, the map extent changing, or the spatial reference changing. Although this is far from the full realm of ArcObjects and .NET customization possibilities, it gives a lot more possibilities than were available in previous versions of ArcGIS.

The nice thing about add-ins is that they are easily shareable. You download the Python Add-In Wizard from Esri, and it helps you prepare and package up your add-in into a .esriaddin file. Other people with ArcGIS can then install the add-in from the .esriaddin file.

Working with Python add-ins is currently not included in the scope of this course, but you can learn all about them in the help book ArcGIS Desktop Python add-ins. After reading this material and getting a basic understanding of what's required to create add-ins, you're welcome to incorporate them into your final project if you have ArcGIS 10.1 and you are confident that you can work somewhat independently to test and create the add-ins. If you have struggled in the course, I recommend that you wait until after completing Geog 485 to further explore add-ins, so that you can give them the necessary amount of time and testing.

Lesson 4 Practice Exercises

These practice exercises will give you some more experience applying the Lesson 4 concepts. They are designed to prepare you for some of the techniques you'll need to use in your Project 4 script.

Download the data for the practice exercises

Both exercises involve opening a file and parsing text. In Practice Exercise A, you'll read some coordinate points and make a polygon from those points. In Practice Exercise B, you'll work with dictionaries to manage information that you parse from the text file.

Example solutions are provided for both practice exercises. You'll get the most value out of the exercises if you make your best attempt to complete them on your own before looking at the solutions. In any case, the patterns shown in the solution code can help you approach Project 4.

Lesson 4 Practice Exercise A

This practice exercise is designed to give you some experience writing geometries to a shapefile. You have been provided two things:

  • A text file MysteryStatePoints.txt containing the coordinates of a state boundary.
  • An empty polygon shapefile that uses a geographic coordinate system.

The objective

Your job is to write a script that reads the text file and creates a state boundary polygon out of the coordinates. When you successfully complete this exercise, you should be able to preview the shapefile in ArcCatalog and see the state boundary.


If you're up for the challenge of this script, go ahead and start coding. But if you're not sure how to get started, here are some tips:

  • In this script there is no header line to read. You are allowed to use the values of 0 and 1 directly in your code to refer to the longitude and latitude, respectively. This should actually make the file easier to process.
  • Before you start looping through the coordinates, create an Array object to hold all the points in your polygon.
  • Loop through each coordinate and create a Point object from the coordinate pair. Then add the Point object to your Array object.
  • Once you're done looping, create an insert cursor on your shapefile. Go to the first row and assign your Array to the SHAPE field.

Lesson 4 Practice Exercise A Solution

Here's one way you could approach Lesson 4 Practice Exercise A with comments to explain what is going on. If you have a different or more efficient solution, please share in the forums.  If you find a more efficient way to code a solution, please share it through the discussion forums.

# Reads coordinates from a text file and writes a polygon
import arcpy
import csv
shapefile = "C:\\data\\Geog485\\MysteryState.shp"
pointFilePath = "C:\\data\\Geog485\\MysteryStatePoints.txt"
spatialRef = arcpy.Describe(shapefile).spatialReference
# Open the file and read the first (and only) line
pointFile = open(pointFilePath, "r")
csvReader = csv.reader(pointFile)
# This Array object will hold a clockwise "ring" of Point
#  objects, thereby making a polygon.
polygonArray = arcpy.Array()
# Loop through each coordinate pair and make a Point object
for coords in csvReader:
    # Create a point, assigning the X and Y values from your list    
    currentPoint = arcpy.Point(coords[0],coords[1])
    # Add the newly-created Point to your Array    
# Create a Polygon from your Array
polygon = arcpy.Polygon(polygonArray, spatialRef)
# Create an insert cursor and apply the Polygon to a new row
with arcpy.da.InsertCursor(shapefile, ("SHAPE@",)) as cursor:

Below is a video offering some line-by-line commentary on the structure of this solution:

Click for a transcript of "4A" video.

PRESENTER: This video walks through the solution to lesson 4, practice exercise A wherein you have a text file with some coordinates that looks like this. And you need to loop through those, and parse them out, and then create a polygon out of it to create the shape of a US state.

Notice that the first coordinate in this list and the last one are the same, indicating that it will be a polygon. This is a pretty basic list. There's no header here. So we're just going to blast right through this as if it were a csv file with two columns and no header.

In order to do this, we're going to use arcpy, which we import in line 3, and the csv module for Python, which we import in line 4. We're also-- in line 6 and 7, we set up some variables referencing the files that we're going to work with in this script. So we're working with the mystery state shapefile that was provided for you.

This shapefile has a coordinate system. I believe it's WG S84 geographic coordinates. However, there is no shape or polygon within this shapefile yet. You will add that. And then line 7 references the path to the text file containing all those coordinate points.

In line 8 we do a little bit of work to detect the spatial reference of MysteryState shapefile. We use that later when we start creating geometries. This pattern of using the describe method and then getting the spatial reference property should be familiar to you from your work in lesson 2.

In line 11 we open up that text file that has the points. So the first parameter with the open method is pointFilePath, which we created in line 7. And then the second parameter is r in quotes. And that means we're opening it in read mode. We're just going to read this file. We're not going to write anything into it.

And in line 12, we then pass that file into the csvReader constructor so that we can create this csvReader object that will allow us to iterate through all the rows in the file in an easy fashion. Before we start doing that though, in line 16 we create an empty arcpy array object.

We know in this case we're just going to create one geometry. It's going to be one polygon with one array. So it's OK to create that array before we start looping. And we actually don't want to create it inside loop because we would be creating multiple arrays, and we just need one of them in this case.

In line 19 we actually start looping through the rows in the file. Each row is going to be represented here by a variable called coords. And so in line 22, we create an arcpy point object. And when you create a point, you need at least the longitude coordinate, which is the X value, and the latitude, which is Y.

Now, going back to our file, you'll see that in the first column or column 0, that's the longitude. So we have coords 0 in our code. And then the next piece is the latitude or the y-coordinate. That's in column with index 1. So we use coords 1 for the y.

Once we have that point, we can add it into the array. We just keep doing that, cycling through the file until we've added point after point after point inside that array and we've gotten to the end.

Now, in line 28 the loop has exited. Notice that because it's not indented. And we're going to take the array that we made and make a polygon object out of it. So we create arcpy.Polygon. And the things we pass in are the array itself, and then the spatial reference for this geometry, which we retrieved back in line 8.

So we've got the polygon all set up, but we haven't added it to the shapefile yet. And in order to do that, we're going to use an InsertCursor. In this case, we just use the very simple with statement to create the InsertCursor.

The InsertCursor needs a couple of parameters in order to get started. It needs the path to the file that we're going to modify, and that's represented by the variable shapefile. Remember we created that back in line six.

And then there's this tuple of fields that we're going to modify. In our case, we're only going to modify the geometry. We're not going to modify any attributes so the way that we supply this tuple is it just has one item called shape with the @ sign, and then a comma. A tuple of one just ends with a comma like this inside the parentheses.

And then our final line 32 actually inserts the row. And it inserts just the polygon geometry and nothing else as described. When this is done, we should be able to go into ArcMap and verify that we indeed have a polygon inside of that shapefile.

Lesson 4 Practice Exercise B

This practice exercise does not do any geoprocessing or GIS, but it will help you get some experience working with functions and dictionaries. The latter will be especially helpful as you work on Project 4.

The objective

You've been given a text file of (completely fabricated) soccer scores from some of the most popular teams in Buenos Aires. Write a script that reads through the scores and prints each team name, followed by the maximum number of goals that team scored in a game, for example:

River: 5
Racing: 4

Keep in mind that the maximum number of goals scored might have come during a loss.

You are encouraged to use dictionaries to complete this exercise. This is probably the most efficient way to solve the problem. You'll also be able to write at least one function that will cut down on repeated code.

I have purposefully kept this text file short to make things simple to debug. This is an excellent exercise in using the debugger, especially to watch your dictionary as you step through each line of code.

This file is space-delimited, therefore, you must explicitly set up the CSV reader to use a space as the delimiter instead of the default comma. The syntax is as follows:

csvReader = csv.reader(scoresFile, delimiter=" ")


If you want a challenge, go ahead and start coding. Otherwise, here are some tips that can help you get started:

  • Create variables for all your items of interest, including winner, winnerGoals, loser, and loserGoals and assign them appropriate values based on what you parsed out of the line of text. Because there are often ties in soccer, the term "winner" in this context means the first score given and the term "loser" means the second score given.
  • Review chapter 6.8 on dictionaries in the Zandbergen text. You want to make a dictionary that has a key for each team, and an associated value that represents the team's maximum number of goals. If you looked at the dictionary in the debugger it would look like {'River': '5', 'Racing': '4', etc.}
  • You can write a function that takes in three things: the key (team name), the number of goals, and the dictionary name. This function should then check if the key has an entry in the dictionary. If not, a key should be added and its value set to the current number of goals. If a key is found, you should perform a check to see if the current number of goals is higher than the value associated with that key. If so, you should set a new value. Notice how many "ifs" appear in the preceding sentences.

Lesson 4 Practice Exercise B Solution

This practice exercise is a little trickier than previous exercises. If you were not able to code a solution, study the following solution carefully and make sure you know the purpose of each line of code.

The code below refers to the "winner" and "loser" of each game. This really refers to the first score given and the second score given, in the case of a tie.

# Reads through a text file of soccer (football)
#  scores and reports the highest number of goals
#  in one game for each team
# ***** DEFINE FUNCTIONS *****
# This function checks if the number of goals scored
#  is higher than the team's previous max.
def checkGoals(team, goals, dictionary):
    #Check if the team has a key in the dictionary
    if team in dictionary:
        # If a key was found, check goals against team's current max
        if goals > dictionary[team]:
            dictionary[team] = goals
    # If no key found, add one with current number of goals
        dictionary[team] = goals
# ***** BEGIN SCRIPT BODY *****

import csv
# Open the text file of scores
scoresFilePath = "C:\\Data\\Geog485\\Scores.txt"
scoresFile = open(scoresFilePath)
# Read the header line and get the important field indices
csvReader = csv.reader(scoresFile, delimiter=" ")
header = csvReader.next()

winnerIndex = header.index("Winner")
winnerGoalsIndex = header.index("WG")
loserIndex = header.index("Loser")
loserGoalsIndex = header.index("LG")

# Create an empty dictionary. Each key will be a team name.
#  Each value will be the maximum number of goals for that team.
maxGoalsDictionary = {}

for row in csvReader:

    # Create variables for all items of interest in the line of text    
    winner = row[winnerIndex]
    winnerGoals = row[winnerGoalsIndex]
    loser = row[loserIndex]
    loserGoals = row[loserGoalsIndex]
    # Check the winning number of goals against the team's max
    checkGoals(winner, winnerGoals, maxGoalsDictionary)
    # Also check the losing number of goals against the team's max    
    checkGoals(loser, loserGoals, maxGoalsDictionary)
# Print the results
for key in maxGoalsDictionary:
    print key + ": " + maxGoalsDictionary[key]

Below is a video offering some line-by-line commentary on the structure of this solution. Please note two errors in the video: 1) At one point I said "line 30" when I meant "line 31". 2)  I used the word "tie" to refer to the scenario when both teams end up with the same number of goals, when I know you football fans will be quickly reminding me that I should have called this a "draw".  :-)

Click for a transcript of "4B" video.

This video is going to describe one possible solution for lesson 4 practice exercise B wherein you are reading the names of soccer teams in Buenos Aires, and looking at their scores, and then compiling a report about the top number of goals scored by each team over the course of this time covered by this file.

In order to maintain all this information, it's helpful to use a dictionary, which is a way of storing information in the computer's memory based on key value pairs. So the key in this case will be the name of the team, and the value will be the maximum number of goals found for that team as we read the file line by line. Now, this file is sort of like a comma separated value file, although the delimiter in our case is a space. The file does have a header, which we'll use to pull out information.

The header is organized in terms of winner, winning goals, loser, losing goals. Although really, there are some ties in here. So we might say first score and second score rather than winner and loser. For our purposes, it doesn't matter who won or lost, because the maximum number of goals might have come during a loss or a tie.

So this solution is a little more complex than some of the other practice exercises. It involves a function which you can see beginning in line 9, but I'm not going to describe this just yet. I'll wait until we get to that point where we need the logic that's in that function. So I'll start explaining this solution by going to line 23, where we import the Python CSV module.

Now, there's nothing in this script that uses ArcGIS or ArcPy geometries or anything like that, so I don't import ArcPy at all. But you will do it in Project 4 where you'll use a combination of the techniques used here along with ArcGIS geometries and really put everything together from both the practice exercises. In line 26, we set up a variable representing the path to that text file. And in line 27, we actually open the file. By default, it opens here in read mode. The read parameter is not specifically supplied here.

In line 30, we create the CSV reader object. You should be familiar with this from the other examples in the lesson and the other practice exercise. The one thing that's different here is, as a second parameter, we can specify the delimiter using this type of syntax-- delimiter equals, and then a space. This file does have a header. So in line 30, we'll read the header, and then we figure out the index positions of all of the columns in the file. That's what's going on in lines 33 through 36.

Now, we know that these are in the order 0, 1, 2, 3 in position. But writing it in this way where we use the header.index method makes the script a little more flexible in case the column order had been shifted around by somebody, which could easily happen if somebody had previously opened this file in a spreadsheet program and moved things around. In line 40, we're going to create a blank dictionary to keep track of each team and the maximum number of goals they've scored. We'll refer to that dictionary frequently as we read through the file.

In line 42, we begin a loop that actually starts reading the file row by row after the header. And so lines 45 through 48 are pulling out those four pieces of information-- basically, the two team names and the number of goals that each scored. Now, when we get a team name and a number of goals, we need to check it against our dictionary to see if the number of goals scored is greater than that team's max. And we need to do this check for both the winner and the loser-- or in other words, the first team and the second team listed in the row.

To avoid repeating code, this is a good case for a function. Because we're going to use the same logic for both pieces. Why not write the code just once in a function? So in lines 51 and 54, you'll see that I'm invoking a function called Check Goals. And I pass in three things. I pass in the team name, I pass in the number of goals, and the dictionary.

This function is defined up here in line 9. So I've created a function called Check Goals, and I create variables here for those three things that I know will be passed in-- the team, the number of goals, and the dictionary. Line 11 performs a check to see if the team already has a key in the dictionary. If it does, then we need to look at the number of goals or max goals that have been recorded for the team and check it against the current score that we are reading to see if that maximum number of goals needs to be updated.

So in line 13, that check is occurring. And if indeed the number of goals being read in the current line is greater than the maximum that we know about, then in line 14, we update the maximum and set it equal to what we read within the line. If the number of goals in the current line is not greater than our maximum, then we don't want to do anything. So that's what's in line 16 where it says pass. The pass is just a keyword that means don't do anything here.

Now, if the team has never been read before and it doesn't have an entry in the dictionary for a maximum number of goals, then we're going to actually go down to line 19 and just set the team. We're going to give the team a key in the dictionary, and we're going to set its max goals to whatever we have now. Because in this case, we've only read one record for that team, so the maximum is the one value that we've read.

Now, if this hasn't confused you yet, what we could do is put in a break point and look at this in the debugger. And this will hopefully help you to get a visual feel for what's going on in this dictionary. So I'm going to play this script up until the first time this function gets called. And over here, I've set little watches on some of these variables-- team, goals, and dictionary.

So the first line in this file, you will see that Boca scored two goals and Independiente scored one. So the first time we call this function, we're just going to pass in Boca as the team, two as the number of goals. And right now, there's nothing in our dictionary. So what would you expect to happen when you evaluated line 11?

In our case, we're going down to line 19, because the team does not have a key in the dictionary yet. So we're going to add a key for the team Boca and record two goals for them in the dictionary. And you'll see that our dictionary has changed over here to where Boca has two.

Now if we go ahead and play this again, the next one it's going to evaluate is Independiente with one. Again, because Independiente does not have a key in the dictionary, we'd expect this to go down to line 19. And indeed, that's what happens if we play this out. The next one is Rossing with one goal.

And so as we iterate through this the first few times-- we're heading down to line 19 just to add these keys. But as we get into later runs of this, we'll start seeing the same teams over again. So here's a case as we add a few more where, in the current line, River has three goals, and in the dictionary, their maximum is two. So what would you expect to happen when we evaluate line 11?

In this case, because River a already has a key, we're going to go to line 13, and we'll check to see if goals-- which is three-- is greater than their current maximum, which is two. And indeed it is, so we invoke line 14. And watch carefully. The dictionary here performs the update when we go to the next line. So as you're working on Project 4 and you're working with dictionaries in this manner, please keep the debugger open and watch what's happening, and you should be able to tell if your dictionary is being updated in the way that you expect.

So to finish out this script, once we have our dictionary all built, then we're going to loop through it one last time and print each key and each value. And that can be done using a simple for loop, like in line 57. In this case, the variable key represents a key in the dictionary. And in line 58, we print out that key, and we print a colon and a space, and then we print the associated value for that key. If you want to pull a value out of a dictionary, you use square brackets and you pass in the key name. And so that's where we're doing there. So running this all the way through should produce a printout in your interactive window of the different teams as well as the maximum number of goals found for each.

Project 4: Parsing rhinoceros sightings

In this project, you're working for a wildlife conservation group that is tracking rhinos in the African savannah. Your field workers' software resources and GIS expertise are limited, but you have managed to obtain a CSV spreadsheet showing the positions of several rhinos over time. Each record in the spreadsheet shows the latitude/longitude coordinate of a rhino along with the rhino's name (these rhinos are well known to your field workers).

Your task is to write a script that will turn the readings in the spreadsheet into a vector dataset that you can place on a map. This will be a polyline dataset showing the tracks the rhinos followed over the time the data was collected. You are required to use the Python csv module to parse the text and arcpy geometries to write the polylines.

Please carefully read all the following instructions before beginning the project. You are not required to use functions in this project but you can gain over & above points by breaking out repetitive code into functions.


This project has the following deliverables:

  1. Your plan of attack for this programming problem, written in pseudocode in any text editor. This should consist only of short, focused steps describing what you are going to do to solve the problem. This is a separate deliverable from your customary project writeup.
  2. A Python script that reads the data from the spreadsheet and creates, from scratch, a polyline shapefile with n polylines, n being the number of rhinos in the spreadsheet. Each polyline should represent a rhino's track chronologically from the beginning of the spreadsheet to the end of the spreadsheet. Each polyline should also have a text attribute containing the rhino's name. The shapefile should use the WGS 1984 geographic coordinate system.
  3. A short writeup (~300 words) explaining what you learned during this project and which requirements you met, or failed to meet. Also describe any "over and above" efforts here so that the graders can look for them.

Successful delivery of the above requirements is sufficient to earn 90% on the project. The remaining 10% is reserved for efforts that go "over and above" the minimum requirements. This could include (but is not limited to) a batch file that could be used to automate the script, creation of the feature class in a file geodatabase instead of a shapefile, or the breaking out of repetitive code into functions and/or modules.


You may already see several immediate challenges in this task:

  • The rhinos in the spreadsheet appear in no guaranteed order, and not all the rhinos appear at the beginning of the spreadsheet. As you parse each line, you must determine which rhino the reading belongs to and update that rhino's polyline track accordingly. You are not allowed to sort the Rhino column in a spreadsheet program before you write the script. Your script must be "smart" enough to work with an unsorted spreadsheet in the order that the records appear.
  • You do not immediately know how many rhinos are in the file or even what their names are. Although you could visually comb the spreadsheet for this information and hard-code each rhino's name, your script is required to handle all the rhino names programmatically. The idea is that you should be able to run this script on a different file, possibly containing more rhinos, without having to make many manual adjustments.
  • You have not previously created a feature class programmatically. You must find and run ArcGIS geoprocessing tools that will create an empty polyline shapefile with a text field for storing the rhino's name. You must also assign the WGS 1984 geographic coordinate system as the spatial reference for this shapefile.


  • Before you start writing code, write a plan of attack describing the logic your script will use to accomplish this task. Break up the original task into small, focused chunks. You can write this in Word or even Notepad. Your objective is not to write fancy prose, but rather short, terse statements of what your code will do: in other words, pseudocode. Here's an example of some pseudocode that might appear in your file:

    . . .
    Read the next line.
    Pull out the pieces of info needed from the line (lat, lon, name)
    Determine if the dictionary has a key for the rhino name.
    If no key exists, create a new array object.
    Create a new point object.
    Assign the lon reading to the X coordinate of the point.
    Assign the Y reading to the lat coordinate of the point.
    Add the point to the array.
    Add the array to the dictionary using the rhino name as the key.
    . . .

    If you do a good job writing your pseudocode, you'll find that each line translates into about one line of code. Writing your script then becomes a matter of translating from English to code. You may also find it helpful to sketch out a diagram of the workflow and logistical branches in your script.
  • You will have a much easier time with this assignment if you first create the array objects representing each rhino track, then use insert cursors to add the arrays once they are completed. Not only is this easier to code, it's better for performance to open the insert cursor only once near the end of the script.
  • A Python dictionary is an excellent structure for storing a rhino name coupled with the rhino's array of observed locations. A dictionary is similar to a list, but it stores items in key-value pairs. For example, a key could be a string representing the rhino name, and that key's corresponding value could be an ArcGIS Array object containing all the points where the rhino was observed. You can retrieve any value based on its key, and you can also check whether a key exists using a simple if key in dictionary: check.

    We have not worked with dictionaries much in this course, but your Zandbergen text has an excellent section about them and there are abundant Python dictionary examples on the Internet.

    You can alternatively use lists to keep track of the information, but this will probably take more code. Using dictionaries I was able to write this script in under 65 lines (including comments and whitespace). If you find yourself getting confused or writing a lot of code with lists, you may try to switch to dictionaries.
  • To create your shapefile programmatically, use the CreateFeatureClass tool. The ArcGIS Desktop Help has several examples of how to use this tool. If you can't figure this part out, I suggest you create the feature class manually and work on writing the rest of the script. You can then return to this part at the end if you have time.
  • In order to get the shapefile in WGS 1984, you'll need to create a spatial reference object that you can assign to the shapefile at the time you create it. I recommend using the arcpy.SpatialReference() method. Be warned that if you do not correctly apply the spatial reference, your polyline precision could be diluted.

If you do things right, your polylines should look like this (points are included only for reference):

Final rhino tracks


Although I have placed the data in an African context (who heard of rhinos wandering New York City?) it is completely fabricated and does not resemble the path of any actual rhino, living or dead. If you exhibit a stellar performance on this project, you may choose the option of having a rhino named after you in a future offering of this course!