GEOG 487
Environmental Challenges in Spatial Data Science

Background Information


Background Information

Customizing Data For Your Project

One of the amazing aspects of GIS is the ability to combine information about multiple topics from multiple time periods and from multiple sources into one place and then analyze them spatially. There are tradeoffs to consider for this convenience. As we saw in Lesson 2, before you can use data that you did not create yourself, you need to invest a great deal of time to acquire and understand each dataset. The more data sets you include, the more time you need to spend on these tasks. After you have acquired and understand your input datasets, you still need to customize them for your project. Other time-consuming tasks include interpreting the results of your analysis and figuring out how to best communicate them to your target audience. The analysis itself can be the quickest part – you typically just need to click a few buttons and wait for a GIS tool to run.

Customizing data for your project involves two main tasks: 1) modifying your input datasets so that they are consistent enough to combine them in spatial analysis and 2) modifying them so that they can answer your specific questions in your study area. The specific sub-tasks can be grouped into two main categories: spatial tasks and attribute tasks. It is better to address spatial issues first since you will likely add or remove records from your attribute tables in the process. Examples of each type are described below:

Table 1: Overview of Common Data Preparation Tasks
Spatial Tasks Attribute Tasks
Convert Data Format Understand Coded Values
Resolve Projection / Misalignment Issues Recode Missing Data
Customize Data Organization Recode Typos
Correct Topology Errors Reclassify Attributes
Modify Extent Create New Attributes
Confirm Scale Convert Units

Spatial Tasks:

  • Convert Data Format: You may find the original data format unsuitable for your project or analysis. For example, you may need to convert vector data to raster or vice versa to use a particular tool. You may also want to convert all of your input data into the same format (e.g., shapefile to a geodatabase, shapefile to GeoJSON, convert all rasters to vectors, or convert all vectors to rasters).
  • Resolve Projection/Misalignment Issues: Sometimes, your datasets will not overlay properly in ArcGIS due to missing or incorrect spatial references. You may need to assign the projection to a file if none is listed. This comes up more often than you may think - the spatial reference information will be listed in the metadata or on a website. However, it might be missing the .prj file necessary for ArcGIS to recognize the projection information associated with a shapefile. Once all of your data sets have the correct spatial reference information defined, you need to reproject them all into the same system (coordinate system, zone, units, projection, and datum). There is an ArcGIS geoprocessing tool we will use this method later in the lesson. For vector data, use the “Project” tool; for raster data, use the “Project Raster” tool).

    Keep in mind that ArcGIS Pro has the ability to project data on the fly, which can be misleading. According to Esri, the project on-the-fly capabilities within ArcGIS are for “display and query purposes only.” This means your actual data are NOT reprojected. Don’t let ArcGIS fool you. Just because your data sets may appear to align in a Map, they may not actually align if you try to combine them using geoprocessing tools (e.g., Tools group on the Analysis tab). For data sets to align for analysis purposes, you should actually reproject all of your input data into the same spatial reference.


    According to ESRI, project on-the-fly in ArcGIS works better for vector data since the process of projecting rasters on the fly is so much more complex than projecting vector data. Projecting data on the fly, regardless of whether the data is vector or raster, does not always produce consistent results. Sometimes, it works perfectly; other times, it does not. You can read more about it in this Esri Blog Post, "Projection on the fly and geographic transformations"

  • Customize Data Organization: One of the major drawbacks to working with raw files is the data are not seamless like online data services. Murphy’s Law almost guarantees that your study site will fall right on the boundary of two or more geographic units from which your input data is aggregated. You may have to merge individual files before starting to work with the data. Alternatively, the input dataset may contain more detail than you need. You may want to dissolve it into larger units based on an attribute value.
  • Correct Topology Errors: Topology errors such as gaps, overlaps, or empty geometry can cause errors during geoprocessing. Geodatabases have many tools to easily correct these types of errors. In this lesson, we will use a quick way to fill in gaps within a polygon shapefile.
  • Modify Extent: Your input datasets may cover many different extents. You will need to clip them all to your study area. There are different tools to clip raster and vector datasets. In this lesson, we will clip vector datasets; in later lessons, we will clip raster data sets.
  • Scale: The scale of your input data sets is not necessarily something you will modify but rather something you should be aware of. One common mistake is to assume a data set is more accurate than it really is. This is easy to do if you overlay two data sets - one with a fine scale and one with a course scale. ArcGIS allows you to zoom in and out infinitely. This is especially dangerous with vector data sets since they never appear pixilated as rasters do when you zoom in very close. This means at some point, you will zoom in past the scale meant for the data use. For example, National Wetland Inventory data is designed for use at 1:24,000 or coarser scales. It is tempting to zoom in closer than 1:24,000 or interpret the data at a finer scale than really exists in the data.

Attribute Tasks:

  • Understand Coded Values: Attribute tables often contain mysterious coded or abbreviated values and cryptic field names. If metadata files are not packaged with the raw data, you can usually find the information you need somewhere on the source website, by doing a general Internet search, or by contacting the agency or organization. Alternatively, if metadata is easily accessible, it can take a substantial amount of time to read through all of the documentation. Either way, you need to budget at least a few hours for this task, especially if you are working with data products or providers you are not familiar with.
  • Recode Missing Data\Typos: Missing values such as 0, <NULL>, or blanks can skew your results. Depending on how the data were created, you may find typos in the attributes such as extra spaces, capital letters, etc. (Wetlands vs. wetlands). Although we can tell both of these values should be “Wetlands,” the computer interprets them as separate values. You should look for these types of errors before starting an analysis, so you can exclude or recode values if necessary.
  • Reclassify or Create New Attributes: All of the attributes included with a particular data set may not be relevant or useful for your particular application. You may want to remove attributes you don’t need to make your files easier to work with. You may also need to add new attributes to your input data to use them for your particular project. For example, you may calculate or derive new information from the existing attributes or join your data to tables with additional information.
  • Convert Units: Measured or calculated values such as lengths or areas may be in different units. Multiplying inches by feet is not going to give you the answer you want.

Introduction to Wetlands and Invasive Species

What are wetlands? Wetlands can be broadly described as transition zones between water and land. They are notoriously hard to define because their characteristics vary greatly depending on their location and the environment in which they are located. One trait all wetland varieties share is that they have properties of both upland and aquatic environments that create unique ecosystems.

Kayak going through wetlands with lilypads
Figure 2: Wetlands, such as the one pictured here at Ludington State Park in Michigan, provide many recreational opportunities, such as kayaking.
© Rachel Kornak. Used with permission.

Wetlands are important for several reasons. First, they support a vast array of life with biodiversity and population counts comparable to tropical rainforests and coral reefs. For example, they are used as nesting and feeding grounds by many species of migratory birds, and most fish and shellfish are dependent on wetlands for some portion of their lifecycle. Second, wetlands help absorb and regulate the flow of water over large regions. During extremely wet periods, wetlands absorb and store excess water, preventing floods and associated damage. They are a natural disaster management system. Third, wetlands help to recharge groundwater aquifers, a source of drinking and irrigation water, during times of drought when rain is scarce. Fourth, wetlands help to filter and purify water. As water enters a wetland, its speed is drastically reduced, mitigating possible erosion of valuable soils. Reducing water speed also causes suspended and dissolved particles, such as pollutants and nutrients, to drop out of the water when they enter a wetland. Plants and microorganisms living in the wetland then absorb and break down these particles. Artificial and natural wetlands are often used to treat stormwater and wastewater for this reason. Lastly, the combination of water and wildlife found in wetlands supports several types of livelihood and recreation, such as fishing, boating, hiking, and bird watching.

Unfortunately, wetlands are often threatened by human activities. Wetlands can either be completely eliminated or degraded so much so that their ecosystem cannot function. For example, wetlands are often drained to expose new land for agriculture or development or are flooded to create lakes. Over 96% of the original wetlands along western Lake Erie have been lost in this manner since the 1860s. In addition, runoff from lawns and impervious surfaces can add excessive amounts of pollutants such as fertilizers, pesticides, and sediment, which degrade the wetlands that absorb the material. A common land management technique is to build earthen dikes around wetlands, causing them to be hydraulically separated from surrounding areas. This artificial process eliminates the natural cycle of high and low water levels necessary for vegetation regulation. It also limits the movement of small biota in and out of wetlands, which is critical for the reproduction of many species.

Purple loosestrife plant growing in the water of a wetland
Figure 3: Purple loosestrife is a common invasive plant in many Midwestern wetlands. You can see how it dominates over native species like arrowheads and lily pads.
Pickney State Recreation Area, Michigan © Rachel Kornak. Used with permission.

Wetlands are also threatened by the spread of invasive species, also known as non-native or exotic species. Both plants and animals can be considered invasive. These species are naturally very adaptable and aggressive and have a high reproductive capacity. They are considered invasive only when they spread outside of their natural range, where they out-compete native species due to their vigor and lack of natural enemies. Once established, they are extremely difficult to eliminate. Their presence in an ecosystem often causes economic, human health, and environmental damage. Some examples of invasive species in the Great Lakes Region are purple loosestrife, common reed, reed canary grass, narrow-leaved cattails, hybrid cattails (narrow/broad-leafed), emerald ash borer, common carp, sea lamprey, zebra mussels, and West Nile virus.

Recognizing the importance of wetlands is a relatively recent initiative. For example, the Ramsar Convention, an international treaty for the conservation of wetlands, wasn’t adopted until the mid-1970s. The U.S. North American Wetlands Conservation Act, which provides funding to protect and manage wetland habitat, wasn’t enacted until 1989. Since then, government agencies have created a set of laws regulating the use and management of wetlands. They also established a network of protected wetland areas that are managed by various state and federal agencies in which wetland managers try to restore degraded wetlands while attempting to balance the competing interests of recreation, habitat for particular species, and the spread of invasive species.

We are going to explore several of the data customization concepts described above in the context of a historical wetland restoration project within a federally protected area. The case study site is located in the Ottawa National Wildlife Refuge, located about 20 km east of Toledo, Ohio.