GEOG 489
Advanced Python Programming for GIS

3.2 Installing the required packages for this lesson

PrintPrint

This lesson will require quite a few different Python packages. We will take care of this task right away so that you then won't have to stop for installations when working through the lesson content. We will use our Anaconda installation from Lesson 2 and create a fresh Python environment within it. In principle, you could perform all the installations with a number of conda installation commands from the command line. However, there are a lot of dependencies between the packages and it is relatively easy to run into some conflicts that are difficult to resolve. Therefore, we instead provide a YAML .yml file that lists all the packages we want in the environment with the exact version and build numbers we need. We create the new environment by importing this .yml file using conda in the command line interface ("Anaconda Prompt"). For reference, we also provide the conda commands used to create this environment at the end of this section. Also important to note is that one of the packages we will be working with in this lesson is the ESRI ArcGIS for Python API, which will require a special approach to authenticate with your PSU login. You will already see this approach further down below and it will then be explained further in Section 3.10. 

Creating the Anaconda Python environment 

Please follow the steps below and if you get issues we've got an alternative approach below.

If you're having issues you'll notice adjacent links to download a YAML file and to use that everywhere below you see "37" please replace it with "38" even if you've got v3.9 - there's no current technical difference between v3.8 & v3.9 for this lesson and in reality the ac37 should work no matter which version you're using. That might sound a little confusing but you should be ok with the AC37 file but just in case we've got some fallbacks. If you have trouble creating the environment from the YAML file there's specific instructions below.

1) Download the .zip file containing the .yml file from this link: ac37_Fall2023.zip (AC38_SP24.zip only if required), then extract the file .yml it contains. You may want to have a quick look at the content of this text file to see how, among other things, it lists the names of all packages for this environment with version and build numbers. Using a YAML file greatly speeds up the creation of the environment as the files are downloaded and dependencies don't need to be resolved on the fly by conda.

2) Open the program called "Anaconda Prompt" which is part of the Anaconda installation from Lesson 2. 

3) Make sure you have at least 5GB space on your C: drive (the environment will require around 3.5-4GB). Then type in and run the following conda command to create a new environment called AC37 (for Anaconda Python 3.7 or AC38 for Python 3.8) from the downloaded .yml file. You will have to replace the ... to match the name of the .yml file and maybe also adapt the path to the .yml file depending on where you have it stored on your harddisk.  

conda env create --name AC37 -f "C:\489\ac37_....yml"

Conda will now create the environment called AC37 (AC38 if you're using that other file above for Python v3.8) according to the package list in the YAML file. This can take quite a lot of time; in particular, it will just say "Solving environment" for quite a while before anything starts to happen. If you want, you can work through the next few sections of the lesson while the installation is running. The first section that will require this new Python environment is Section 3.6. Everything before that can still be done in the ArcGIS environment you used for the first two lessons. When the installation is done, the AC37 (AC38 for Python v3.8) environment will show up in the environments list in the Anaconda Navigator and will be located at C:\Users\<user name>\Anaconda3\envs\AC37 .

4) Let's now do a quick test to see if the new environment works as intended. In the Anaconda Prompt, activate the new environment with the following command (you'll need to activate your environment every time you want to use it):

activate AC37

Then type in python and in Python run the following commands; all the modules should import without any error messages:

import bs4
import pandas
import cartopy
import matplotlib
from osgeo import gdal
import geopandas
import rpy2
import shapely
import arcgis
from arcgis.gis import GIS

As a last step, let's test connecting to ArcGIS Online with the ArcGIS for Python API mentioned at the beginning. Run the following Python command: 

gis = GIS('https://pennstate.maps.arcgis.com', client_id='lDSJ3yfux2gkFBYc')

Now a browser window should open up where you have to authenticate with your PSU login credentials (unless you are already logged in to Penn State). After authenticating successfully, you will get a window saying "OAuth2 Approval" and a box with a very long code at the bottom. In the Anaconda Prompt window, you will see a prompt saying "Enter code obtained on signing in using SAML:". Use CTRL+A and CTRL+C to copy the entire code, and then do a right-click with the mouse to paste the code into the Anaconda Prompt window. The code won't show up, so just continue by pressing Enter. 

If you are having troubles with this step, Figure 3.18 in Section 3.10 illustrates the steps. You may get a short warning message (InsecureRequestWarning) but as long as you don't get a long error message, everything should be fine. You can test this by running this final command:

print(gis.users.me)

This should produce an output string that includes your pennstate ArcGIS Online user name,  so e.g., <User username:xyz12_pennstate> . More details on this way of connecting with ArcGIS Online will be provided in Section 3.10.

If creating the environment from the .yml file did NOT work:
Creating the AC37 environment from scratch with Conda

As we wrote above, importing the .yml file with the complete package and version number list is probably the most reliable method to set up the Python environment for this lesson but there have been cases in the past where using this approach failed on some systems. Or maybe you are interested in the steps that were taken to create the environment from scratch. We therefore list the conda commands used from the Anaconda Prompt for reference below.

1) Create a new conda Python 3.7 environment called AC37 with some of the most critical packages:

conda create -n AC37 -c conda-forge -c esri python=3.7 nodejs arcgis=2 gdal=3 jupyter ipywidgets=7.6.0

2) As we did in Lesson 2, we activate the new environment using: 

activate AC37

3) Then we add the remaining packages: 

conda install -c conda-forge rpy2=3.4.1
conda install -c conda-forge r-raster=3.4_5
conda install -c conda-forge r-dismo=1.3_3
conda install -c conda-forge r-maptools
conda install -c conda-forge geopandas
conda install -c conda-forge cartopy

4) Once we have made sure that everything is working correctly in this new environment, we can export a YAML file similar to the one we have been using in the first part above using the command:

conda env export > AC37.yml

If creating the environment from the AC38.yml file did NOT work:
Creating the AC38 environment from scratch with Conda

As we wrote above, importing the .yml file with the complete package and version number list is probably the most reliable method to set up the Python environment for this lesson but there have been cases in the past where using this approach failed on some systems. Or maybe you are interested in the steps that were taken to create the environment from scratch. We therefore list the conda commands used from the Anaconda Prompt for reference below.

1) Create a new conda Python 3.8 environment called AC38 with some of the most critical packages (and you'll notice there's some additional package version numbers specified to handle inconsistencies in V3.8/V3.9):

conda create -n AC38 -c conda-forge -c esri python=3.8 nodejs arcgis=2 gdal=3 jupyter ipywidgets=7.6.0 requests=2.29.0 urllib3=1.26.18

2) As we did in Lesson 2, we activate the new environment using: 

activate AC38

3) Then we add the remaining packages: 

conda install -c conda-forge rpy2=3.4.1
conda install -c conda-forge r-raster=3.4_5
conda install -c conda-forge r-dismo=1.3_3
conda install -c conda-forge r-maptools
conda install -c conda-forge geopandas
conda install -c conda-forge cartopy  matplotlib=3.5.3 pillow=9.2.0 shapely=1.8.5 fiona=1.8.22

4) Once we have made sure that everything is working correctly in this new environment, we can export a YAML file similar to the one we have been using in the first part above using the command:

conda env export > AC38.yml

Potential issues

There is a small chance that the from osgeo import gdal will throw an error about DLLs not being found on the path which looks like the below:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\jao160\anaconda3\envs\AC38_SP24\lib\site-packages\osgeo\__init__.py", line 46, in <module>
    _gdal = swig_import_helper()
  File "C:\Users\jao160\anaconda3\envs\AC38_SP24\lib\site-packages\osgeo\__init__.py", line 42, in swig_import_helper
    raise ImportError(traceback_string + '\n' + msg)
ImportError: Traceback (most recent call last):
  File "C:\Users\jao160\anaconda3\envs\AC38_SP24\lib\site-packages\osgeo\__init__.py", line 30, in swig_import_helper
    return importlib.import_module(mname)
  File "C:\Users\jao160\anaconda3\envs\AC38_SP24\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 657, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 556, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1166, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: DLL load failed while importing _gdal: The specified module could not be found.

On Windows, with Python >= 3.8, DLLs are no longer imported from the PATH.
If gdalXXX.dll is in the PATH, then set the USE_PATH_FOR_GDAL_PYTHON=YES environment variable
to feed the PATH into os.add_dll_directory().

In the event this happens the fix is to (every time you want to import gdal you would need to do this): 

import os
os.environ["USE_PATH_FOR_GDAL_PYTHON"]="YES"
from osgeo import gdal

It's possible the above fix doesn't work and the error is still thrown which will require checking the PATH environment variable in the Anaconda Prompt by typing "path" and checking that c:\osgeo42\bin or osgeo4w64\bin is in the list and if not adding it using set path=%PATH%;c:\osgeo4w\bin