GEOG 489
Advanced Python Programming for GIS

3.5.1 Python packages for (spatial) Data Science

PrintPrint

It would be impossible to introduce or even just list all the packages available for conducting spatial data analysis projects in Python here, so the following is just a small selection of those that we consider most important.

numpy

numpy (Python numpy pageWikipedia numpy page) stands for “Numerical Python” and is a library that adds support for efficiently dealing with large and multi-dimensional arrays and matrices to Python together with a large number of mathematical operations to apply to these arrays, including many matrix and linear algebra operations. Many other Python packages are built on top of the functionality provided by numpy.

matplotlib

matplotlib (Python matplotlib page, Wikipedia matplot page) is an example of a Python library that builds on numpy. Its main focus is on producing plots and embedding them into Python applications. Take a quick look at its Wikipedia page to see a few examples of plots that can be generated with matplotlib. We will be using matplotlib a few times in this lesson’s walkthrough to quickly create simple map plots of spatial data.

SciPy

SciPy (Python SciPy page, Wikipedia SciPy page) is a large Python library for application in mathematics, science, and engineering. It is built on top of both numpy and matplotlib, providing methods for optimization, integration, interpolation, signal processing and image processing. Together numpy, matplotlib, and SciPy roughly provide a similar functionality as the well known software Matlab. While we won’t be using SciPy in this lesson, it is definitely worth checking out if you're interested in advanced mathematical methods.

pandas

pandas (Python pandas page, Wikipedia pandas software page) provides functionality to efficiently work with tabular data, so-called data frames, in a similar way as this is possible in R. Reading and writing tabular data, e.g. to and from .csv files, manipulating and subsetting data frames, merging and joining multiple data frames, and time series support are key functionalities provided by the library. A more detailed overview on pandas will be given in Section 3.8. 

Shapely

Shapely (Python Shapely page, Shapely User Manual) adds the functionality to work with planar geometric features in Python, including the creation and manipulation of geometries such as points, polylines, and polygons, as well as set-theoretic analysis capabilities (intersection, union, …). It is based on the widely used GEOS library, the geometry engine that is used in PostGIS, which in turn is based on the Java Topology Suite  (JTS) and largely follows the OGC’s Simple Features Access Specification.

geopandas

geopandas (Python geopandas page, GeoPandas page) combines pandas and Shapely to facilitate working with geospatial vector data sets in Python. While we will mainly use it to create a shapefile from Python, the provided functionality goes significantly beyond that and includes geoprocessing operations, spatial join, projections, and map visualizations.

GDAL/OGR

GDAL/OGR (Python GDAL page, GDAL/OGR Python) is a powerful library for working with GIS data in many different formats widely used from different programming languages. Originally, it consisted of two separated libraries, GDAL (‘Geospatial Data Abstraction Library‘) for working with raster data and OGR (used to stand for ‘OpenGIS Simple Features Reference Implementation’) for working with vector data, but these have now been merged. The gdal Python package provides an interface to the GDAL/OGR library written in C++. In Section 3.9 and the lesson’s walkthrough, you will see some examples of applying GDAL/OGR.

ArcGIS API for Python

As we already mentioned at the beginning, Esri provides its own Python API (ArcGIS for Python page) for working with maps and GIS data via their ArcGIS Online and Portal for ArcGIS web platforms. The API allows for conducting administrative tasks, performing vector and raster analyses, running geocoding tasks, creating map visualizations, and more. While some services can be used autonomously, many are tightly coupled to Esri’s web platforms and you will at least need a free ArcGIS Online account. The Esri API for Python will be further discussed in Section 3.10.