GEOG 585
Open Web Mapping

Walkthrough: Getting source data from OpenStreetMap

Print

Getting data out of OpenStreetMap (OSM) presents more technical challenges than putting data into OSM. When you put data into OSM, you can use your choice of a number of different types of editors. You can use any tags that you want, attempting to stick to tagging conventions of course.

In contrast, when you get data out of OSM, you have to deal with the following:

  • Retrieving only the tags you need
  • Retrieving the data format you need
  • Not overwhelming yourself or the server by requesting too much data

Complicating matters is the fact that OSM returns data in its own structure of XML, which is not immediately readable by many GIS applications. Therefore, getting data from OSM often involves converting from this XML into some other format.

There are a variety of mechanisms for downloading OSM data. The easiest ones address the challenges by providing a way to filter the tags you want, allowing you to specify the output format, and allowing you to specify a geographic bounding box for the requested data, so you don't retrieve too much.

One of the most user-friendly GUI-oriented ways that I have found for retrieving OSM data is a server at BBBike.org (http://extract.bbbike.org). This little web-based tool allows you to draw a bounding box interactively and specify the output format you want. After a while, you receive an e-mail with a link to download your data.

 OSM downloads from BBBike extract service
Figure 9.6

In the walkthrough, however, we'll use the OSM download mechanism that is build directly into QGIS. Although this way is a little more advanced than the BBBike extract service, it is more immediate and allows greater flexibility for the amount of data and tags selected.

Downloading OSM data using QGIS

Examine the image below of Cayenne, French Guiana. You'll notice that the city has detailed building footprint polygons available. Let's suppose that we want to get a shapefile of these building footprints using QGIS.

Map of buildings in Cayenne
Figure 9.7

Note that we have defined our three pieces of essential information to filter the OSM data we want:

  • The tags we want: any polygon with the building tag populated as anything other than building=no (a somewhat rare value but one that is occasionally used)
  • The format we want: a shapefile
  • The bounding box of data we want: just the city of Cayenne

Follow these steps to get the data using QGIS:

  1. Create a new data folder such as c:\data\Cayenne.
  2. Launch QGIS and click Vector > OpenStreetMap > Download data.
  3. Choose Manual and specify the bounding coordinates of the area you want to download. In this case, use the bounding coordinates of Cayenne, which are shown below.
     Download OpenStreetMap data dialog box in QGIS
    Figure 9.8

    When doing this, be careful that you don't specify a bounding box larger than you need, or you could end up with an extraordinary amount of data.

    The bounding coordinates must be supplied in WGS 1984 lat/lon format or the tool will not work. It may take a bit of detective work to figure out these coordinates before you launch QGIS.

  4. Specify the Output file with a name such as
    cayenne.osm
    as shown above and click OK. Wait while your data is downloaded. At the time of this writing, the size was around 23 MB.
    If the download fails in the middle, delete your .osm file and try it again.
  5. Click Close to close the download window.
    You currently have a .osm file containing XML. You will now convert this into a SpatiaLite database that can be used in QGIS and other programs.
  6. Click Vector > OpenStreetMap > Import topology from XML and fill out the dialog box as shown below.
     OpenStreetMap Import dialog box
    Figure 9.9
  7. Click OK and wait for the import to occur. Then click Close to close this dialog box.

    You've now brought the data into a SpatiaLite database, but now you need to create a useful layer out of it with just the geometries and tags of interest.
     
  8. Click Vector > OpenStreetMap > Export topology to SpatiaLite.
  9. Complete the dialog box as follows:

    - For Input DB file, browse to your
    cayenne.osm.db
    SpatiaLite file. Be aware that the .db extension may not be visible in Windows Explorer, but if the file shows up in the file browser dialog, then you are okay.
    - For Export type choose Polygons (closed ways)
    - For Output layer name use
    cayenne_polygons

    - For Exported tags, click Load from DB and then check some tags pertinent to buildings. In our scenario, check source, building, amenity, addr:housenumber, and addr:street.
     Export OpenStreetMap topology dialog box
    Figure 9.10
  10. Click OK and then click Close to close the dialog box. You should now see a layer in QGIS containing all the polygons. If the layer is not added automatically, you can do so manually by using the Add SpatialLite Layer button, connecting to cayenne.osm.db, selecting the cayenne_polygons table, and then clicking on Add.
     Cayenne OSM polygons in QGIS
    Figure 9.11
    Now you need to select only the building polygons.
  11. In the map table of contents, right-click cayenne_polygons and click Open Attribute Table.
  12. At the top of the attribute table, click the Select features using an expressionExpression button button.
  13. Paste the following query in the Expression box including all quote marks: "building" != 'NULL' AND "building" != 'no'
     Select buildings using an expression
    Figure 9.12
    This expression filters out everything that's not a building. When you do this with your own data of interest, you'll need to create some expression that selects only the tag combinations that you want.
  14. Click Select. You should see the building features selected in the map.
  15. In the map table of contents, right-click the cayenne_polygons layer and click Save As...
  16. Choose Esri shapefile as the format and specify an output location. Select the Save only selected features option. Then click OK.
     Save selection as shapefile
    Figure 9.13
  17. Use QGIS to verify that your exported shapefile contains only the buildings.
     Final view of Cayenne buildings in QGIS
    Figure 9.14

Downloading data using the Overpass OpenStreetMap query API

Behind any data retrieval mechanism from OSM is a web service request. You can send these requests directly from your web browser or an automated program using an OSM query API. One of the most powerful of these APIs is called Overpass. Try the following:

  1. Paste the following URL in a web browser and wait for a minute until prompted to save a file:
    http://www.overpass-api.de/api/xapi_meta?*[building=yes][bbox=-52.35,4.88,-52.25,4.98]
    Notice what this is requesting...It should look familiar.
  2. When prompted to save the file, save it as buildings.osm.
  3. Open buildings.osm in a text editor and see what all the buildings in Cayenne look like when expressed as OSM-formatted XML.

You can use Python or other scripting languages to make these requests automatically. For example, here's how you could use Python to query OSM for all the farmers' markets in Philadelphia and save them to a .osm file. (You're not required to run this code).

import urllib
	
workspace = "C:\\data\\OSMdev\\"
	
# Make data queries to jXAPI
marketsXml = urllib.urlopen("http://www.overpass-api.de/api/xapi_meta?*%5Bshop=farm%5D%5Bbbox=-75.29,39.86,-74.95,40.15%5D").read()
	
# Make farmers markets file
marketsPath = workspace + "markets.osm"
marketsFile = open(marketsPath, 'w')
marketsFile.write(marketsXml)
marketsFile.close()

For Python junkies: The above code uses a library called urllib which is able to make web requests and read the responses. You just have to provide the URL for the request. So as not to be interpreted as defining a list, the "[" and "]" characters are escaped using the %5B and %5D sequences, respectively, but otherwise the query has the same syntax as the one you issued above for Cayenne buildings. The resulting XML is then written to a file using the standard Python write method.

A script like this might be useful if you wanted to update one or more datasets on a periodic basis. The script could be combined with GDAL processing to get the data into a format suitable for your web map. Recent versions of GDAL (1.10 and later) can read OSM XML and convert it to different formats, such as GeoJSON or shapefiles. (Be careful with shapefiles though, because GDAL plops most of the less common "other tags" into one field that gets cut off at 256 characters, a limitation of the shapefile format).

As an exclamation point at the end of all this geekiness, play around with the graphical tool overpass turbo for a few minutes. This gives you an interactive environment for querying OSM and seeing the results on the map. You can save any interesting result in popular formats, such as KML. This is helpful if you just want to make a one-off query to OSM for some particular feature type.

There are many circumstances and needs that can affect the way you retrieve data from OSM. Hopefully, this walkthrough has provided enough options that you can make an informed decision about how to best get the scope and scale of data you need. Now let's go to the lesson assignment where you'll get some experience with the other side of things: putting data into OSM.