GEOG 583
Geospatial System Analysis and Design

Why Choose Open Data Standards?


Most open source GIS projects are heavily invested in implementing open standards. In general, this means Open Geospatial Consortium (OGC) standards. The Open Geospatial Consortium was founded in 1994 in response to pent-up demand in the government and industry to solve the issue of spatial data sharing and interoperability. Back then, spatial data tended to be stored in proprietary formats, often giving specific GIS vendors a competitive advantage. In the 1980s and early 1990s, the process of reformatting or translating spatial data required time-consuming, expensive custom add-ons, typically from the original vendor of the system. See the OGC website for more information.

Some of the key OGC standards are briefly outlined here

  • Simple Feature - this is one of the OGC's earliest standards. It defines what a geographic feature is (at a minimum a point, line or polygon) and then sets out a common format for text and binary representations of geographic features. The simple refers to the lack of topology in the data structure often called spaghetti data. This standard promotes interoperability as, if one program exports its data in either Well Known Text (WKT) or Well Known Binary (WKB), then it is easy for another program to read in the same data and know what it means.
  • Geographic Markup Language (GML) is an extension of XML schema (or grammar) for the expression of geographical features. It is used as an interoperability format for features that are too complex to express using the Simple feature standard. It is used particularly by Web Feature Service (WFS).
  • KML - was developed as a competitor to the OGC's GML, but it is now one of the more well known of the OGC's standards. It was originally developed by Keyhole (that's the K) and then popularized by Google's Google Earth application. It was donated to the OGC in 2007 to be developed as an open standard for 2 and 3D map annotation.
  • UML - This is an open and standardized way of representing programming and modeling entities, their properties and their relationships, and formulating their parameters and actions. It can be used to diagram out a programing task and some types of the software can write part of the code. It is related to Esri Model Builder and to their Model Diagrams e.g. Esri Biodiversity Conservation. We will be looking at UML more in the next lesson so take time to look at these links.
  • Web Mapping Service (WMS) - WMS was one of the first OGC standards and set the basis for all web mapping for many years. Recently Google, Yahoo! and Microsoft have produced their own proprietary web mapping systems which, while popular, carry the same risk of vendor lock in and arbitrary changes as any other proprietary system.
  • Web Feature Service (WFS) - this is the standard that a service has to conform to if you want to serve geographic features over the web.
  • Web Coverage Service (WCS) - the WCS standard defines an interface and operations to access geographic coverages (rasters) over the web.

Another very important open(-ish) format is the shapefile! One of the reasons for the wild popularity of shapefiles is that Esri released the specification as an open document; you can read the technical description of the shapefile here. Esri places no restrictions on other organizations implementing shapefile readers or writers. I implemented an *.shx writer in Java, using only the spec (although I didn't really enjoy it). I think if they had made it a closed format, it would never have become the de facto standard for vector geospatial data.

Avoiding Vendor Lock-In

Vendor lock-in occurs when a proprietary data structure becomes no longer supported. This could be the result of a database type no longer being recognized or a language customization module no longer being supported. This can happen in both open source and licensed software. However, the use of open standards minimizes this problem but will not alleviate it completely. Migrating data to different applications is easier when the data structures are open and fully understood. Translation tools can be built or are often available. Programs like FME by Safe Software can facilitate this migration. However, when the data structure is not understood nor documented, then the data is locked-in and cannot be translated without buying programming from the original vendor, and, in the case of open source, the supporter might not be available. To avoid this problem of lock-in, look for translational tools before commitment, and the use of open standards will protect these translational tools in the future. Open sources can suffer from lock-in, too, especially if the work is not fully documented or the documentation sources are not maintained after the language ceases to be supported. This is more likely in smaller and more hasty efforts at open source software.