Some common open formats for spatial data

This section of the lesson describes in greater detail some of the spatial data formats that have open specifications or are created by open source software. Note that these refer to files or databases that can stand alone on your hardware. We will cover open formats of web services streamed in from other computers in future lessons.

File-based data

File-based data includes shapefiles, KML files, GeoJSON, and many other types of text-based files. Each of the vector formats has some mechanism of storing the geometry (i.e., vertex coordinates) and attributes of each feature. Some of the formats, such as KML may also store styling information.

Below are some of the file-based data formats you're most likely to encounter.

Shapefiles

The Esri shapefile is one of the most common formats for exchanging vector data. It actually consists of several files with the same root name, but with different suffixes. At a minimum, you must include the .shp, .shx, and .dbf files. Other files may be included in addition to these three when extra spatial index or projection information is included with the file. This ArcGIS Resources article gives a quick overview of the different files that can be included.

Because a shapefile requires multiple files, it is often expected that you will zip them all together in a single file when downloading, uploading, and emailing them.

If you want to make a shapefile from scratch, you can refer to the specification from Esri. This is not for the novice programmer, and browsing this spec will hopefully increase your appreciation for those who donate their time and skills to coding FOSS GIS programs.

GeoPackages

The GeoPackage is a relatively new format for storing and transferring vector features, tables, and rasterized tiles across a variety of devices, including laptops, mobile devices, and so forth. It was defined by the Open Geospatial Consortium (OGC), a group you will learn about in more detail in Lesson 4 that consists of industry representatives, academics, practitioners, and others with an interest in open geospatial data formats. The GeoPackage actually stores the data in a SQLite database, described below in more detail in the databases section. I list the GeoPackage here in the file-based formats section because some have advocated for its adoption as a more modern alternative to the shapefile.

KML

KML gained widespread use as the simple spatial data format used to place geographic data on top of Google Earth. It is also supported in Google Maps and various non-Google products.

KML stands for Keyhole Markup Language, and was developed by Keyhole, Inc., before the company's acquisition by Google. KML became an Open Geospatial Consortium (OGC) standard data format in 2008, having been voluntarily submitted by Google.

KML is a form of XML, wherein data is maintained in a series of structured tags. At the time of this writing, the Wikipedia article for KML contains a simple example of this XML structure. KML is unique and versatile in that it can contain styling information, and it can hold either vector or raster formats ("overlays", in KML-speak). The rasters themselves are not written in the KML, but are included with it in a zipped file called a KMZ. Large vector datasets are also commonly compressed into KMZs.

GeoJSON and TopoJSON

JavaScript Object Notation (JSON) is a structured means of arranging data in a hierarchical series of key-value pairs that a program can read. (It's not required for the program to be written in JavaScript.) JSON is less verbose than XML and ultimately results in less of a "payload," or data size, being transferred across the wire in web applications.

Following this pattern, GeoJSON is a form of JSON developed for representing vector features. The GeoJSON spec gives some basic examples of how different entities such as point, lines, and polygons are structured.

You might choose to save GeoJSON features into a .js (JavaScript) file that can be referenced by your web map. Other times, you may encounter web services that return GeoJSON.

A variation on GeoJSON is TopoJSON, which stores each line segment as a single arc that can be referenced multiple times by different polygons. In other words, when two features share a border, the vertices are only stored once. This results in a more compact file, which can pay performance dividends when the data needs to be transferred from server to client.

Other text files

Many GIS programs can read vector data out of other types of text files, such as .gpx (popular format for GPS tracks) and various types of .csv (comma-separated value files often used with Microsoft Excel) that include longitude (X) and latitude (Y) columns. You can engineer your web map to parse and read these files, or you may want to use your scripting skills to get the data into another standard format before deploying it in your web map. This is where Python skills and helper libraries can be handy.

Various raster formats

Most raster formats are openly documented and do not require royalties or attribution. These include JPEG, PNG, TIFF, BMP, and others. The GIF format previously used a patented compression format, but those patents have expired.

Web service maps such as WMS return their results in raster formats, as do many tiled maps. A KML/KMZ file can also reference a series of rasters called overlays.

Spatially-enabled databases

When your datasets get large or complex, it makes sense to move them into a database. This often makes it easier to run advanced queries, set up relationships between datasets, and manage edits to the data. It can also improve performance, boost security, and introduce tools for performing spatial operations.

Below are described several popular approaches for putting spatial data into FOSS databases. Examples of proprietary equivalents include Microsoft SQL Server, Oracle Spatial, and the Esri ArcSDE middleware (packaged as an option with ArcGIS Enterprise) that can connect to various flavors of databases, including FOSS ones.

PostGIS

PostGIS is an extension that allows spatial data management and processing within PostgreSQL (often pronounced "Postgress" or "Postgress SQL"). PostgreSQL is perhaps the most fully featured FOSS relational database management system (RDBMS). If a traditional RDBMS with relational tables is your bread and butter, then PostgreSQL and PostGIS are a natural fit if you are moving to FOSS. The installation is relatively straightforward: in the latest PostgreSQL setup programs for Windows, you just check a box after installation indicating that you want to add PostGIS. An importer wizard allows you to load your shapefiles into PostGIS to get started. The rest of the administration can be done from the pgAdmin GUI program that is used to administer PostgreSQL.

Most FOSS GIS programs give you an interface for connecting to your PostGIS data. For example, in QGIS you might have noticed the button Add PostGIS Layers . The elephant in the icon is a symbol related to PostgreSQL. GeoServer also supports layers from PostGIS.

This course, Geog 585, does not provide walkthroughs for PostGIS; however, there are a couple of excellent open courseware lessons in Geog 868: Spatial Databases that describe how to install and work with PostGIS. I encourage you to make time to study these on your own (or take the instructor-led offering) if you feel that learning PostGIS will be helpful in your career.

You are welcome to use PostGIS in your term project if you feel comfortable with the other course material and want to take on an additional challenge. You can always fall back on file-based data if an excessive amount of troubleshooting is required.

SpatiaLite

SpatiaLite is an extension supporting spatial data in the SQLite database. As its name indicates, SQLite is a lightweight database engine that gives you a way to store and use data in a database paradigm without installing any RDBMS software on the client machine. This makes SQLite databases easy to copy around and allows them to run on many kinds of devices. If you are familiar with Esri products, a SpatiaLite database might be thought of as similar to a file geodatabase.

SpatiaLite is not as mature as PostGIS, but it is growing in popularity, and you will see a button in QGIS called Add SpatiaLite Layer . If you feel it would be helpful in your career, you are welcome to use SpatiaLite in your term project. If you choose to do this, I ask you to first get the project working with file-based data. Then feel free to experiment with swapping out the data source to SpatiaLite.

You will encounter a SpatiaLite database in the Lesson 9 walkthrough when you use QGIS to import data from OpenStreetMap. In that scenario, you are dealing with a large amount of data with potentially many fields of complex attributes. SpatiaLite is a more self-contained and flexible choice than shapefiles, KML, etc., for this type of task.