
Lesson 1 Assignment
Part 1 – Multiprocessing Script
We are going to use the arcpy vector data processing code from Section 1.6.6.2 (download Lesson1_Assignment_initial_code) as the basis for our Lesson 1 programming project. The code is already in multiprocessing mode, so you will not have to write multiprocessing code on your own from scratch but you still will need a good understanding of how the script works. If you are unclear about anything the script does, please ask on the course forums. This part of the assignment will be for getting back into the rhythm of writing arcpy based Python code and practice creating script tool with ArcGIS Pro. Your task is to extend our vector data clipping script by doing the following:
- Modify the code to handle a parameterized output folder path (still using unique output filenames for each shapefile) defined in a third input variable at the beginning of the main script file. One way to achieve this task is by adding another (5th) parameter to the worker() function to pass the output folder information along with the other data.
To realize the modified code versions in this part, all main modifications have to be made to the input variables and within the code of the worker() and mp_handler() functions. Of course, we will also look at code quality, so make sure the code is readable and well documented. There are a few hints that may be helpful after we talk about Part 2.
Part 2 – Single File Multiprocessing Script Tool
In a single script file, (combining the mp_handler code and the worker function into one script) expand the code so that it can handle multiple input featureclasses to be clipped (still using a single polygon clipping feature class).
- The input variable tobeclipped should now take a list of feature class names rather than a single name.
- The worker function should, as before, perform the operation of clipping a single input file (not all of them!) to one of the features in the clipper feature class.
- The main change you will have to make here will be in the main code where the jobs are created.
- The names of the output files produced should have the format:
clip_<oid>_<name of the input feature class>.shp
For instance, clip_0_Roads.shp produced by clipping the Roads featureclass (found in the USA.gdb filegeodatabse) to the State oid '0'. - Ensure that the multiprocessing method obtains its own exclusive worker function.
To realize the modified code versions in this part, it is important to remember how to avoid infinite recursions and the purpose of if __name__ == '__main__': conditional, how namespace/ module imports work, and the use of the module.function() syntax.
Successful delivery of the above requirements is sufficient to earn 95% on the project. The remaining 5% is reserved for efforts that go "over and above" the minimum requirements. Over and above points may be earned by adding further geoprocessing operations (e.g. reprojection) to the worker() function, or other enhancements as you see fit, such as returning a dictionary of results from the workers and parsing them to print success/ failure messages or trying a different multiprocessing method from the table in section 1.6.5.3.
You will have to submit several versions of the modified script for this assignment:
- (A) The modified single-input-file script version from Part 1.
- (B) The single file version multiple-input-files script tool version from Part 2 (within the .atbx)
- (C) Potentially a third version if you made substantial modifications to the code for "over and above" points. If you created a new script tool for this, make sure to include the .atbx file as well.
Hint 1:
When you adapt the worker() function, I strongly recommend that you do some tests with individual calls of that function first before you run the full multiprocessing version. For this, you can, for instance, utilize what we learned about the if name == '__main__': conditional for the multicode script, or comment out the pool code and instead call worker() directly from the loop that produces the job list, meaning all calls will be made sequentially rather than in parallel. This makes it easier to detect errors compared to running everything in multiprocessing mode right away. Similarly, it could be a good idea to view the variables in the debugger or add print statements placed in the job list to make sure that the correct values will be passed to the worker function.
Hint 2 (concerns Part 2):
When changing to the multiple-input-files version, you will not only have to change the code that produces the name of the output files in variable outFC by incorporating the name of the input feature class, you will have to do the same for the name of the temporary layer that is being created by MakeFeatureClass_managment() to make sure that the layer names remain unique. Else some worker calls will fail because they try to create a layer with a name that is already in use.
To get the basename of a feature class without file extension, you can use a combination of the os.path.basename() and os.path.splitext() functions defined in the os module of the Python standard library. The basename() function will remove the leading path (so e.g., turn "C:\489\data\Roads.shp" into just "Roads.shp"). The expression os.path.splitext(filename)[0] will give you the filename without file extension. So for instance "Roads.shp" will become just "Roads". (Using [1] instead of [0] will give you just the file extension but you won't need this here.)
Hint 3 (concerns Part 2):
Once you have the script working in the IDE, it is now time to move it into a script tool. You will have to import your script into itself in order to ensure that each process in the pool can find the worker function when the script is executed as a script tool. Refer to sections 1.3.1 and 1.6.6.3 for this requirement and prevent infinite recursion.
Hint 4 (concerns Part 2):
You will also have to use the "Multiple value" option for the input parameter you create for the to-be-clipped feature class list in the script tool interface. If you then use GetParameterAsText(...) for this parameter in your code, you will get a single string(!) of strings with the names/paths of the feature classes the user picked separated by semicolons, not a list of name/path strings. You can then either use the string method .split(...) and then .strip(...) to turn this list of stringed strings into a usable list of paths. You can also use GetParameter(...), which will provide you with a list of geoprocessing value objects that you can then cast to strings (str(..)) for pickeling. It can save you a lot of time if you add some arcpy.AddMessage(...) statements to print these parameters out so you can see what your variables are. Be sure to verify the output results!
Deliverable
Submit a single .zip file to the corresponding drop box on Canvas; the zip file should contain:
- Your modified code files and ArcGIS Pro toolbox files (up to three different versions as described above). Please organize the files cleanly, e.g., using a separate subfolder for each version.
- A 400-word write-up of what you have learned during this exercise. This write-up should also include:
- Think back to the beginning of Section 1.6.6 and include a brief discussion of any changes to the processing workflow and/or the code that might be necessary if we wanted to write our output data to geodatabases and briefly comment on possible issues (using pseudocode or a simple flowchart if you wish).
- A description of what you did for "over and above" points (if anything).