GEOG 485:
GIS Programming and Software Development

Project 3: Data extraction for a pro hockey team's scouting department

PrintPrint

In this project, you'll use your new skills working with selections and cursors to process some data from a "raw" format into a more specialized dataset for a specific mapping purpose. The data from this exercise was retrieved from the National Hockey League's undocumented statistics API.

Download the data for this project

Background

In this exercise, suppose you are a data analyst for an NHL franchise.  In preparation for the league draft, the team general manager has asked you to make it possible for him to retrieve all current players born in a particular country (say, Sweden) broken down by position.  Your predecessor passed along to you the player shapefile, but unfortunately the player's birth country is not included as one of the attributes.  You do have a second shapefile of world country boundaries, though...

Task

Write a script that makes a separate shapefile for each of the three forward positions (center, right wing, and left wing) within the boundary of Sweden. Write this script so that the user can change the country or list of positions simply by editing a couple of lines of code at the top of the script.

In browsing the attribute table, you'll note that player heights and weights are stored in imperial units (feet & inches and pounds).  As part of this extraction task, to simplify comparisons against scouting reports written using metric units, you should also add two new numeric fields to the attribute table -- to the new shapefiles only, not the original nhlrosters.shp -- to hold height and weight values in centimeters and kilograms.  For every record, populate these fields with values based on the following formulas:

height_cm = height (in inches) * 2.54

weight_kg = weight (in pounds) * 0.453592

Your result should look something like the figure below if viewed in Pro.  The custom symbolization and labels are not required to be part of your script.

Map of Project 3 results

Figure 3.5  Example output from Project 3, viewed in Pro. 

The above requirements are sufficient for receiving 90% of the credit on this assignment. The remaining 10% is reserved for "Over and above" efforts, such as making a script tool, or extending the script to handle multiple target countries, other combinations of fields and queries, etc. For these over and above efforts, we prefer that you submit two copies of the script: one with the basic functionality and one with the extended functionality. This will make it more likely that you'll receive the base credit if something fails with your over and above coding.  

Deliverables

Deliverables for this project are as follows:

  • The source .py file containing your script
  • A short writeup (about 300 words) describing how you approached the project, how you successfully dealt with any roadblocks, and what you learned along the way. You should include which requirements you met, or failed to meet. If you added some of the "over and above" efforts, please point these out, so the grader can look for them.

You do not have to create a script tool for this assignment; you can hard-code the initial parameters. Nevertheless, put all the parameters at the top so they can be easily manipulated by whoever tests the script.

Once you get everything working, creating a script tool is a good way to achieve the "over and above" credit for this assignment. If you do this, then please zip all supporting files before placing them in the drop box.

Notes about the data

Take a look at the provided datasets in Pro, particularly the attribute tables.  The nhlrosters shapefile contains a field called "position" that provides the values you need to examine (RW = Right Wing, LW = Left Wing, C = Center, D = Defenseman, G = Goaltender).  As mentioned, you've been asked to extract the forward positions (RW, LW, and C) to new shapefiles, but you want to write a script that's capable of handling some other combination of positions, too.  There are several other fields that offer the potential for interesting queries as well, if you're looking for over and above ideas.

The Countries_WGS84 shapefile has a field called "CNTRY_NAME". You can make an attribute selection on this field to select Sweden, then follow that up with a spatial selection to grab all the players that fall within this country. Finally, narrow down those players to just the ones that play the desired position.

Tips

Once you've selected Swedish players at the desired position, use the Copy Features tool to save the selected features into a new feature class, in the same fashion as in the practice exercises.

Take this project one step at a time. It's probably easiest to tackle the extraction-into-shapefile portion first.  Once you have all the new position shapefiles created, go through them one by one, use the "Add Field" tool to add the "height_cm" and "weight_kg" fields, followed by an UpdateCursor to loop through all rows and populate these new fields with appropriate values.

It might be easiest to get the whole process working with a single position, then add the loop for all the positions later after you have finalized all the other script logic.

The height field is of type Text to allow for the ' and " characters.  The string slicing notation covered earlier in the course can be used to obtain the feet and inches components of the player height.  Use the inches to centimeters formula shown above to compute the height in metric units.

For the purposes of this exercise, don't worry about capturing points that fall barely outside the edge of the boundary (e.g., points in coastal cities that appear in the ocean). To capture all these in real life, you would just need to obtain a higher resolution boundary file. The code would be the same.

You should be able to complete this exercise in about 50 lines of code (including whitespace and comments). If your code gets much longer than this, you are probably missing an easier way.