The links below provide an outline of the material for this lesson. Be sure to carefully read through the entire lesson before returning to Canvas to submit your assignments.
Note: You can print the entire lesson by clicking on the "Print" link above.
A GIS or web map is only as useful as the data you put into it. Just as the GIS landscape offers proprietary software and open software, you will see sources of proprietary data and open data. This lesson explores the different meanings of "open data" and provides an introduction to OpenStreetMap, a growing repository of open data that is useful in a variety of projects.
Lately, it seems that "open data" is everywhere. Reaching buzzword status, this term is often seen in tandem with phrases such as "open government," "crowdsourcing," "government transparency," and "free and open source software." But what makes data "open"? Just as you learned in Lesson 1, that different organizations, even proprietary software companies, employ the term "open source" to their advantage. There are various nuances to the term "open data" that you should consider whenever you hear someone touting this phrase.
Consider the following means of data access and how they might be placed on a continuum of more or less "open":
Consider how these levels of data access play into the following scenarios:
The most open types of data are those that allow complete download, re-use, and modification of the data in open formats. However, other levels of data openness may be more useful than not seeing the data at all. If you expose a useful dataset through a web map or a web service, you should prepare an answer for the question, "Can I download this data?" It won't be long before somebody asks.
Even when data is freely available for download in open formats at no cost, it may still be subject to licensing restrictions. There are numerous types of open data licenses that stipulate what types of applications can use the data (personal, noncommercial, commercial, etc.) and what kind of attribution must be given. The license may also state the types of modifications that are allowed on the data, especially if the modified dataset is to be redistributed.
To get a feel for some of these licenses such as Creative Commons, Open Database License, Open Government License, and Public Domain, please take a few minutes to read pages 4 - 8 of Licensing Open Data: A Practical Guide [1] by Korn and Oppenheim, 2011. Focus especially on the chart on page 6.
FOSS typically excels at working with open data formats; however, FOSS is certainly not the only option for creating, exposing, or using open data. For example, Esri has invested in building open data discovery and download mechanisms into its ArcGIS Online and Portal for ArcGIS products. The idea is that government customers will be more likely to maintain their data in the proprietary software repository if the repository is easily engineered to allow free and open downloads by the public in popular formats such as KML and CSV. Here is the video of an interview [2] with Esri's Andrew Turner from the Esri DevSummit 2017 on the topic of Open Data that you may find interesting.
If you don't have the money or means to purchase your required GIS data, or if the data doesn't exist, then you may need to collect the data yourself. If your goal is to openly share the resulting data with the public, then you may consider enlisting the public in your data collection efforts. VGI and crowdsourcing are two concepts that come into play when enlisting the public or non-domain experts in the collection of GIS data.
In 2007 Michael Goodchild published a paper in which he elaborated on the idea of volunteered geographic information (VGI). This kind of data is collected by citizens acting as sensors to gather information about the world around them. The citizens then feed this information into a centralized GIS database, often employing a user interface that has been simplified to the degree that specialized training is not required.
VGI has since become a hot term in geographic information science as thousands of people contribute to the OpenStreetMap digital map of the world (discussed later). Governments evaluate the possibilities of creating "citizen reporting" apps that allow anyone to upload information about potholes, graffiti, etc., with the objective of bringing them to the attention of local authorities.
Crowdsourcing is the idea of using the power of a crowd to collect data that is too vast, heterogeneous, or expensive to be collected by other types of sensors. Consider how many people you would have to hire in order to write an encyclopedia with 30 million articles in 250 languages. The crowdsourced website Wikipedia [3] has been able to create a project of this scope solely through crowdsourcing. Other applications of crowdsourcing include combing remotely sensed imagery [4] to find lost people or vehicles, recording old weather measurements from ship logs [5] in order to create climate databases, and transcribing census records [6] to create searchable genealogical indexes.
Crowdsourcing is a particularly good fit for tasks that require an element of human cognition not easily performed by machines. Amazon has even made a business out of crowdsourcing through its Mechanical Turk [7] service. This allows you to hire a crowd of unknown individuals to perform tasks for a particular fee, often pennies for each task. Using an architecture that is conceptually similar to cloud computing, you can scale the task up to as many volunteers as you need.
The concept of crowdsourcing is a good fit for VGI, particularly when a vast amount of data must be collected under time pressure; however, not all VGI projects use crowdsourcing. Some of them are focused on gathering information from a small sample of people or a focused group of domain experts. Cinnamon and Schuurman (2012), for example, enlisted a set of emergency medical professionals at a single hospital to submit information about the locations of local auto accidents. Using tablet computers, the paramedics tapped the screen or typed an address to record the locations of the accidents. The researchers called this type of guided process facilitated VGI (f-VGI), after Seeger (2008). These readings are available in the Lesson 9 module on the course Canvas site if you're interested in learning more about them.
The introduction of humans into the sensory element of data collection presents some interesting advantages and challenges. One advantage is that there are a lot of humans potentially available. Some of them even appear to have a lot of time on their hands! This means that tasks can be scaled up quickly and the data can be collected (or corrected) in a hurry. Humans also have the ability to care about projects and become passionate about them, increasing the amount and quality of data collected and creating an endless source of free organization and labor. It's not always necessary to hire the Mechanical Turk when you're enlisting people in a project they really believe in.
However, humans, by nature, make mistakes in some ways that computers may not. They get tired, they commit typos, they make subjective judgments, and so forth. Furthermore, the technical skills and physical infrastructure (e.g., Internet access) required for VGI participation may not be uniformly distributed throughout your study area. Finally, humans carry particular biases and interests that may skew the types of data collected.
Anyone employing VGI in scientific research or mission-critical applications should be aware of these limitations. The next section of this lesson provides some examples of how these advantages and limitations of VGI have affected OpenStreetMap.
OpenStreetMap (OSM) is a digital map database of the world built through crowdsourced volunteered geographic information (VGI). OSM is supported by the nonprofit OpenStreetMap Foundation [8]. The data from OSM is freely available for visualization, query, download, and modification under open licenses [9].
OSM works in a style similar to Wikipedia, in which virtually all features are open to editing by any member of the user community. OSM was conceived in 2004 and pretty much 10 years later reached two million registered users [10]. Although only a fraction of these are frequent map editors, the map has matured enough in some locations to the point where its detail and precision rival "authoritative" datasets from governments and commercial entities. This is particularly true in Western Europe and some parts of the US. The image below of the Penn State campus provides an idea of the intricate features that can be submitted to OSM.
OSM originally gained popularity in places where government data was not freely available, but a thriving GIS community existed. For example, in the mid-2000s, the UK Ordnance Survey data was available only for purchase, and OSM grew rapidly as an attractive free alternative. In places where governments were willing to freely share their data, bulk upload negotiations were sometimes arranged. For example, the US has fairly thorough road coverage due to a US Census TIGER street data bulk upload.
OSM volunteer efforts constitute a social event and hobby for many, who gather for group data collection events known as "mapping parties." These activities organized armies of volunteers to walk, bike, and drive through sectors of a city with GPS units and notepads, returning later to a central lab to enter the data (Perkins and Dodge 2008). Although this is still useful in cases, nowadays, many OSM beginners can get pretty far just through tracing aerial photographs in simple browser-based editors. In addition to a physical exploration of the city, mapping parties now offer training, awareness, and renewed enthusiasm of OSM (Hristova et al. 2013).
To contribute a feature to OSM, you typically digitize a geometry (a point, line, or area) and then add descriptive attributes, or tags. For example, to tag a grocery store, you trace its building footprint and tag it with shop=supermarket. There's no restriction on the tags you can use, but the data is only useful to the degree that you tag things consistent with the way other OSM users have applied the tags.
To promote consistency in tagging, the OSM community has an informal tag voting and approval process organized on the OpenStreetMap wiki [11] site. Approved tags are added to the online documentation so that others can easily find and apply them. For example, the tag shop=supermarket [12] denotes a grocery store. Before you add a tag, check the wiki to make sure that you're using the established tag and syntax. If you create your own tag and start using it on many features, consider putting it through the OSM proposal process [13].
OSM is not the only crowdsourced VGI project, but it is one of the most well known. As such, it provides a useful exemplification of the pros and cons of crowdsourcing and VGI.
Some of the main benefits of OSM include:
Some of the main challenges of OSM include:
The most basic use of OSM is to retrieve its map tiles as a background for other thematic layers. High-profile sites using OSM in this way include Foursquare, Craigslist, and Wikipedia. Some web developers switched to OSM as a basemap [17] after the Google Maps API introduced potential fees into its terms of service.
From a technical perspective, anyone can use a rendering engine like QGIS or Mapnik to draw tiles of OSM data. In fact, this is what you did with QGIS in the Lesson 5 walkthrough. The image below shows how you can select various basemap renderings on OpenStreetMap.org. Other companies such as Mapbox [18] have made their own OSM renderings that can be consumed as web services. In fact, Mapbox's business model has come to rely so heavily on OSM that the company has invested in near-real-time quality monitoring of incoming OSM edits by hosting OSMCha [19], an OSM Changeset Analyzer [20] tool originally written by Wille Marcel in 2015 (you need to sign in with a OSM account to be able to use the tool).
Let's now take a look at some of the ways OSM can be used "beyond the basemap."
OSM gained publicity as a disaster response aid in 2010 after the Haiti earthquake, as described in this PSU Geospatial Revolution video [21]. Prior to this disaster, publicly available digital data for Haiti was sparse, and OSM was limited to major roads and a handful of other features. In the weeks following the earthquake, Internet volunteers worldwide traced imagery and referenced out-of-copyright maps to create a detailed geographic database of the country in OSM. This provided helpful basemaps for humanitarian aid workers who were flocking to the country and needed maps to get around. It also served as an inventory of hospitals, churches, civic facilities, and other resources that could be used by responders.
The growth of OSM during this period was nothing short of dramatic, and a number of animations such as this video: OpenStreetMap - Project Haiti [22] have depicted the expansion of the map in Haiti during this time period. Zook et al (2010) offer an analysis of various methods of VGI and crowdsourcing used in the earthquake response, including OSM and the crisis mapping site Ushahidi.
Crowdsourced volunteer efforts work most efficiently when there is an organizing force behind the work. Using lessons from the Haiti experience, the Humanitarian OpenStreetMap Team [23] (HOT) now provides this function. After Typhoon Haiyan hit the Philippines in 2013, HOT provided tools to explain and partition the volunteer mapping work on OSM so that the most needed features and geographic areas were given priority. Volunteers visiting the HOT site could click a map sector to work on, and were given instruction about which features to trace and how to tag them. The image below shows the OSM Tasking Manager, an application used by HOT to catalog sectors completed and sectors that need work.
The efforts to rapidly assemble crisis mappers in Haiti and the Philippines are admirable, but the ideal situation would be to already have the OSM data on hand. These regions only needed the mapping because sufficient information hadn't been contributed in the first place. Lack of technical infrastructure, a shortage of human and monetary capital, civil restrictions, and other factors can cause places to remain unmapped. Graham (2010) calls these places "virtual black holes" in VGI. Unfortunately, commercial Internet maps may also neglect these places if it is believed the search and advertising functions related to the map will not produce sufficient revenue to justify the investment.
OSM has been used as a way to give a presence to communities that have previously remained unmapped. Hagen (2010) describes a project in Nairobi wherein local youth volunteers were enlisted and trained to map the sprawling slum of Kibera. Home to hundreds of thousands of people, this settlement was little more than a name on previous maps. The Map Kibera [24] effort used OSM tools to record water points, toilets, clinics, schools, pharmacies, places of worship, and NGO offices. The result is a map that the residents can use to find local services and lobby the government for infrastructure support. The features added through this project are immediately apparent when you navigate to Kibera using even the default OSM map.
Similar stories can be found elsewhere in the world. When participants in a Buenos Aires hackathon wanted to map social services in a local slum, they found the area empty in commercial maps and decided to use OSM as a basemap [25]. Even when a street network exists, other layers such as bus routes may be helpful for individuals without automobiles, opening possibilities for local travel outside of daily routines. Motivated individuals have headed up an OSM project with bus routes in India [26], noting that a detailed local map can also help with tourism promotion efforts.
One of the advantages of OSM is its flexibility to store any type of feature, given the many tags that already exist and the community-based tag proposal and voting process. In some cases, specialized thematic maps have been created around a subset of feature types. Examples of these include:
In these maps, OSM acts as a freely accessible repository for local knowledge of useful things. Some of these mapped features provide great value to a community, but are not monetarily lucrative and may be excluded from proprietary commercial maps. Even the default OSM tiles do not show all the above types of features because to do so would cause the map to be cluttered. There is a great need for developers who can retrieve custom subsets of data from OSM and display it in thematic maps.
Remember that thematic maps are only possible because OSM allows free download and re-use of the data. Sites that use OSM for thematic mapping often rely on one of the various query APIs available for OSM, such as the Overpass API that allows the submission of custom tag queries through a web service. Asking a web service to give you all features matching a certain tag is often more manageable than downloading the entire OSM dataset for a region. You will get a taste of the Overpass API in the lesson walkthrough.
The maps and queries depend heavily on users maintaining consistency with established tag syntax. For example, the Philly Fresh Food Map relies on tags described in the Food Security [31] page of the OSM wiki.
Getting data out of OpenStreetMap (OSM) presents more technical challenges than putting data into OSM. When you put data into OSM, you can use your choice of a number of different types of editors. You can use any tags that you want, attempting to stick to tagging conventions of course.
In contrast, when you get data out of OSM, you have to deal with the following:
Complicating matters is the fact that OSM returns data in its own structure of XML, which is not immediately readable by many GIS applications. Therefore, getting data from OSM often involves converting from this XML into some other format.
There are a variety of mechanisms for downloading OSM data. The easiest ones address the challenges by providing a way to filter the tags you want, allowing you to specify the output format, and allowing you to specify a geographic bounding box for the requested data, so you don't retrieve too much.
One of the most user-friendly GUI-oriented ways that I have found for retrieving OSM data is a server at BBBike.org [32]. This little web-based tool allows you to draw a bounding box interactively and specify the output format you want. After a while, you receive an e-mail with a link to download your data.
In the walkthrough, however, we'll use the OSM download mechanism that is available in QGIS via the OSMDownloader plugin. Although this way is a little more advanced than the BBBike extract service, it is more immediate and allows greater flexibility for the amount of data and tags selected.
Examine the image below of Cayenne, French Guiana. You'll notice that the city has detailed building footprint polygons available. Let's suppose that we want to get a shapefile of these building footprints using QGIS.
Note that we have defined our three pieces of essential information to filter the OSM data we want:
While in a previous version of QGIS, downloading OSM data was integrated into the main program and available under Vector > OpenStreetMap > Download data, this functionality has been removed and now requires a plugin to be installed. On the positive side, these plugins now require much fewer steps to obtain the data. There are several plugins available that can be used for downloading OSM data. We here show you the steps using the OSMDownloader plugin. Go ahead and install this plugin from the QGIS plugin manager. You should be able to find it under "Not installed". When activated, the plugin will add a toolbar with a single button to QGIS: When you click it, this button will become highlighted and you can then select an area in the map canvas for which you want to download the OSM data.
Now perform the following steps:
Under Save Location, press the Save File button and navigate to the folder you created in step 1. Then use cayenne.osm for the file name.
Click OK, which will start the download process. A progress bar will appear and finally a window informing you that the download has finished.
Figure 9.9
We are done using the download plugin now. Next, let's add the downloaded .osm file with the OSM data to our QGIS project. For this, you can simply drag the cayenne.osm file from the Windows File Explorer onto the map canvas in QGIS. The .osm file contains entities of different geometry type. However, the layer that will be added to QGIS can only contain features of a single geometry type. Therefore, you will be shown a dialog window in which you have to pick that geometry type. Since we are interested in building polygons, you should select "multipolygons" here and then click the button either called OK or Add Layers.
Figure 9.10
Behind any data retrieval mechanism from OSM is a web service request. You can send these requests directly from your web browser or an automated program using an OSM query API. One of the most powerful of these APIs is called Overpass [33]. Try the following:
http://www.overpass-api.de/api/xapi_meta?*[building=yes][bbox=-52.35,4.88,-52.25,4.98]Notice what this is requesting...It should look familiar.
You can use Python or other scripting languages to make these requests automatically. For example, here's how you could use Python to query OSM for all the farmers' markets in Philadelphia and save them to a .osm file. (You're not required to run this code).
import urllib workspace = "C:\\data\\OSMdev\\" # Make data queries to jXAPI marketsXml = urllib.urlopen("http://www.overpass-api.de/api/xapi_meta?*%5Bshop=farm%5D%5Bbbox=-75.29,39.86,-74.95,40.15%5D").read() # Make farmers markets file marketsPath = workspace + "markets.osm" marketsFile = open(marketsPath, 'w') marketsFile.write(marketsXml) marketsFile.close()
For Python junkies: The above code uses a library called urllib which is able to make web requests and read the responses. You just have to provide the URL for the request. So as not to be interpreted as defining a list, the "[" and "]" characters are escaped using the %5B and %5D sequences, respectively, but otherwise the query has the same syntax as the one you issued above for Cayenne buildings. The resulting XML is then written to a file using the standard Python write method.
A script like this might be useful if you wanted to update one or more datasets on a periodic basis. The script could be combined with GDAL processing to get the data into a format suitable for your web map. Recent versions of GDAL (1.10 and later) can read OSM XML and convert it to different formats, such as GeoJSON or shapefiles. (Be careful with shapefiles though, because GDAL plops most of the less common "other tags" into one field that gets cut off at 256 characters, a limitation of the shapefile format).
As an exclamation point at the end of all this geekiness, play around with the graphical tool overpass turbo [34] for a few minutes. This gives you an interactive environment for querying OSM and seeing the results on the map. You can save any interesting result in popular formats, such as KML. This is helpful if you just want to make a one-off query to OSM for some particular feature type.
There are many circumstances and needs that can affect the way you retrieve data from OSM. Hopefully, this walkthrough has provided enough options that you can make an informed decision about how to best get the scope and scale of data you need. Now let's go to the lesson assignment, where you'll get some experience with the other side of things: putting data into OSM.
The Lesson 9 assignment has two parts: reporting on a web map that uses OpenStreetMap (OSM), and actually editing OSM yourself. You will produce a single document describing these efforts.
Find an Internet map that uses some element of OSM. Produce a write up of several paragraphs describing the following:
What is the purpose and URL of the site, and who built it?
How is OSM being used? (i.e., Is the site simply pulling the OSM tiles, or is the source data used for creating thematic layers, etc.?)
Include at least one screenshot showing the OSM data.
What advantages and disadvantages are introduced into this map by using OSM data?
Do you see any other appropriate ways that OSM data could be used in this site?
In this part of the assignment, you'll get some practice with adding data to OSM in your town or some other place that you know well. You'll take some "before" and "after" screenshots to demonstrate the things you added to the map.
The easiest way to get started with editing OSM is using the in-browser editor at OpenStreetMap.org [35], which is called iD.
Links
[1] http://www.discovery.ac.uk/files/pdf/Licensing_Open_Data_A_Practical_Guide.pdf
[2] https://www.youtube.com/watch?v=d3tbRE9RsSc
[3] http://wikipedia.org
[4] http://www.cnn.com/2014/03/11/us/malaysia-airlines-plane-crowdsourcing-search/
[5] http://www.oldweather.org/
[6] https://familysearch.org/indexing/
[7] http://aws.amazon.com/mturk/
[8] http://wiki.osmfoundation.org/wiki/Main_Page
[9] http://www.openstreetmap.org/copyright
[10] https://blog.openstreetmap.org/2015/03/12/two-million-contributors/
[11] http://wiki.openstreetmap.org/wiki/Main_Page
[12] http://wiki.openstreetmap.org/wiki/Tag:shop=supermarket
[13] http://wiki.openstreetmap.org/wiki/Proposal_process
[14] https://www.missingmaps.org/
[15] http://sterlingquinn.net/apps/crowdlens
[16] https://www.youtube.com/watch?v=01mktydWmUM
[17] https://www.techdirt.com/articles/20120405/17321218398/google-maps-exodus-continues-as-wikipedia-mobile-apps-switch-to-openstreetmap.shtml
[18] https://www.mapbox.com/data-platform/
[19] https://osmcha.org/
[20] https://labs.mapbox.com/mapping/validating-osm/
[21] https://www.youtube.com/watch?v=gxCEb5Cv4Nk
[22] https://www.youtube.com/watch?v=BwMM_vsA3aY
[23] http://hot.openstreetmap.org/
[24] http://mapkibera.org/
[25] http://blog.ilabamericalatina.org/2013_06_01_archive.html
[26] http://bitterscotch.wordpress.com/2010/04/29/mapping-a-new-way-forward-for-openstreetmap-in-india/
[27] http://www.opencyclemap.org/
[28] http://openskimap.org/
[29] http://www.wheelmap.org
[30] http://www.gis.cwu.edu/phillyfood/
[31] http://wiki.openstreetmap.org/wiki/Food_security
[32] http://extract.bbbike.org
[33] http://wiki.openstreetmap.org/wiki/Overpass_API
[34] http://overpass-turbo.eu/
[35] http://www.openstreetmap.org
[36] https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.openstreetmap.org%2Fwiki%2FMap_features&data=05%7C01%7Cpmg5371%40psu.edu%7C7820a0df957d466d5ead08da5dcb4a15%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C637925424940276266%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=m3RmUgrFdRTI%2BKb9cjcGSpeCc2sorqDWp9zTig3V7uE%3D&reserved=0