GEOG 858
Spatial Data Science for Emergency Management

Emerging Theme: Spatial Data Science

PrintPrint

For this week’s Emerging Theme topic, we are going to take a step back from emergency management and focus on spatial data science (SDS) in general. I want to emphasize that SDS (and terms like Big Data or Machine Learning) can mean several different things.

On the one hand, it is how we talk about GIS and geospatial science in the age of large data sets (e.g., imagery and otherwise), enhanced computing power, and networked data and services. A lot of traditional GIS workflows are described in (spatial) data science terms. For example, variants of regression analysis and hotspot analysis are referred to as machine learning and cluster detection, respectively. This is all fine, but SDS is also the integration of big data, high performance computing, and programming of machine learning/AI algorithms to conduct analysis in some fundamentally different ways from traditional GIS/geospatial analysis. You will explore and discuss some this complexity in this Emerging Theme Discussion. 

To set the stage, I'd like you to have a look at few perspectives on spatial data science, and where it is heading, from two geospatial industry leaders, Esri and Carto, and university researchers at the Center for Spatial Data Science at the University of Chicago. 

What is Spatial Data Science? 

What is Spatial Data Science (6:08)

 

When considering SDS as a set of activities, we can identify several interrelated parts. These are listed here with some examples of common associated tasks (not exhaustive):

  • Data ingestion, cleaning and management
    • Obtain data, formatting, cleaning, and management in database system 
  • Exploratory data analysis
    • Statistical methods for data reduction, data visualization  
  • Data enrichment
    • Data linkage, spatially enabling data, calculate new variables
  • Spatial analysis 
    • Mapping, hot spot analysis, space-time analysis, overlays and spatial queries 
  • Machine Learning
    • Cluster analysis, regression, predictive analytic, object recognition
  • Big Data Analytics 
    • Big data, (near) real-time
  • Visualization and Communication 
    • Maps, graphics, interactive, web, Tableau/Qlik/BA

Visit Carto's Technology Stack Overview page to see a similar list. Take note of the Data ingestion and Management & Analysis steps. Are you familiar with the technologies listed there? Pick a couple e.g., PostGIS, Python SDK, ELT, PostgresSQL that you are not familiar with and look them up. Gaining a general familiarity with the various parts of SDS is a good first step.

Finally, Carto have produced a useful free e-book on Becoming a Spatial Data Scientist (download the PDF here). Read the first chapter and have quick look at the rest of the book. This may be a good resource for you going forward as it lists many of the tools you can use for analytics projects. 

You are probably aware that the dominant player in the GIS space is Esri, the developer of ArcGIS Pro amongst many other offerings. In addition to desktop software, they offer server and cloud based services that allow for big data analytics at scale.  

Visit the Esri Spatial Analysis and Data Science page. Note the components of SDS they outline and a few of the tools on offer. I'd like you to take a closer Machine Learning and AI & Big Data Analytics. 

Machine Learning and AI

Artificial Intelligence is a somewhat generic term for a class of techniques including machine learning and deep learning. On a basic level, AI is all about developing algorithms that can "learn", or can be "trained", to recognize patterns in datasets and then predict likely behavior. For example, algorithms have been written to identify and differentiate sharks from swimmers in real-time UAV camera feeds over beaches in Australia. Post hurricane damage assessment is also commonly done by AI these days, often with the help of volunteers training the algorithms e.g., looking at single buildings and decided on a damage class.   

Artificial intelligence, machine learning and deep learning. Source: Esri 

Read this short article on Machine Learning in ArcGIS by Esri Spatial Analyst Lauren Bennent. What are some of the key issues she cites about using ML and GIS? What stands out as being different from what you can do with Desktop GIS alone? Do you think you can get started with ML using ArcGIS Pro? What constraints might you run up against?   

Big data analytics

One way SDS is different from traditional GIS workflows is the ability to deal with large volumes of data including collection and cleaning, storage, analysis and visualization. Analysis of real-time (or near real-time) data is a rapidly growing area for geospatial science and emergency management applications in particular. Have a look at the following video and website to see a geo-analytics workflow using Esri.   


Real-Time GIS and Analytics (5:42)

Center for Spatial Data Science

The geospatial industry are making great advances in SDS and delivering data and tools to a wide audience, however research groups at universities have been at the cutting edge of developments in (spatial) data science for many years. This includes work in computer science, high performance computing, mathematics, statistics, geography, human-computer interaction, amongst others.

One research group that has been very influential across these areas is Professor Luc Anselin's Center for Spatial Data Science at the University of Chicago. Have a look at a few of the research projects this center has undertaken in recent years. What similarities or differences do you see compared to the problems described in the Carto or Esri sites, or that you have usually thought about in the context of GIS problems?

Screenshot of multiple linked displays from analysis with GeoDA

One of this group's most widely used products is the GeoDA software. This program has a lot of basic GIS functionality but is also loaded with easy to use advanced spatial analysis tools. This is a desktop application, but many of the tools can be used by coding with Python and R, thus making the tools scalable with data and hardware needs. 

Look at the GeoDA pages and also visit their github site which hosts software and training materials. Be sure to scroll down this page to view the desktop spatial analysis program GeoDA. As mentioned, they are also actively developing R libraries. Why would they focus on both? 

Note that you can download and use GeoDA for free (and it works on multiple platforms). It might be worth considering as part of your projects?

Data Science Computing

As mentioned previously, SDS goes beyond desktop GIS and requires the use of a range of computing resources and programming tools to manage different analysis steps.

What about the hardware required for Spatial Data Science. In many ways it is all about scalability. You may be able to accomplish many tasks with desktop software like ArcGIS Pro, but for bigger and more complex analysis you may need to rely on enterprise solutions or high performance computing.

NVIDIA A100 Tensor Core GPU (Source: NVIDIA)

Have a very quick look at this fact sheet for the NVIDIA A100 Tensor Core GPU. This type of hardware is designed for for AI, data analytics and high performance computing in server/cloud applications. 

Hardware like this is used in distributed computing where tasks to be split up and conquered by a stack or cluster of processors. The figure below is from the Riga Technical University and shows how a central computer (head node) is orchestrates analysis jobs undertaken by computing nodes.

Distributed computing example (Source: Riga Technical University

Distributed computing is controlled by software systems such as Hadoop. Here is a description from the developers website of what Hadoop does:  

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. - Source Hadoop 

Data Science Software and Programming

I'd like to end this section by showing you a useful diagram produced by Carto (again!). It is meant to show the relationships amongst data science tools and geospatial analysis. 

Python and R are the main languages used for data manipulation and analysis in much of SDS. The two languages overlap in functionalist but also offer different capabilities (R is good for some things / Python excels at others). This highlights that you need to be somewhat pragmatic and use whatever tool will work best. The tools hanging off the R and Python circles refer to specific packages e.g., ArcPy is the site package used by Esri for accessing ArcGIS functionality. SQL is the main language for querying and managing databases. Finally, the platforms area refers to the many ways you can interact with the data and run analyses. Are you familiar with any of these? The Carto book recommended above provides some practical help on how to set some of these up for your own analysis.  

Data Science tools for spatial analysis (Source: Carto - What is Spatial Data Science?)

Additional Resources

We will come back to the topics of GeoAI and real-time analytics later in the course, but in the meantime Esri and Carto offer many free resources on SDS (some listed above) and this includes free seminars and training materials. Have a look at this page listing current resources and upcoming events - Spatial Data Science Events, Videos, Webinars and Courses.

The growing interest in spatial data science has spawned several conferences that bring together scientists and analysts in the public and private sectors. I encourage you to take a look at the Spatial Data Science Conference website. You can register and attend online for free this year. 

Deliverable

  1. Post a comment in the Emerging Theme Discussion (L4) forum that describes similarities and differences between traditional desktop GIS and Spatial Data Science. How you think spatial data science is changing or will change crisis and emergency management approaches? 
  2. Provide a link and short description to a VGI effort ‘in the news’ or that you have otherwise come across.
  3. NOTE: Respond to this assignment in the Emerging Theme Discussion (L4) forum by the date indicated on the course calendar.

Grading Criteria

This discussion will be graded out of 15 points.

Please see the Discussion Expectations and Grading page under the Orientation and Course Resources module for details.