GEOG 865
Cloud and Server GIS

Maintaining and updating a tile cache


A tile cache is just a picture of your data at one point in time. If that data ever changes, you need to update the cache. This final section of the lesson gives some practical considerations for updating and maintaining a cache over time.

Update strategy should affect the decision to make a tile cache

Your update strategy probably should have come into consideration before you even decided you were going to create a cache. If you need to see data in real time, or you have frequent changes occurring over broad extents of the map, then creating a tile cache may not be appropriate.

For each map, there's a threshold of acceptable data currency. For a neighborhood street map available in your handheld GPS, you may find it acceptable if the street data is updated once every three months. For a tax assessor looking at land parcels, it may be acceptable to have the data current to within the past day or two. For a 911 operator tracking a vehicle's progress, a delay of more than a few seconds may not be acceptable.

If the cache update can be performed within the threshold of acceptable data currency, then it may make sense to create a cache. If the cache cannot be updated that quickly, then caching should not be used.

Focusing cache updates

There are two approaches for cache updates; generate the entire cache, or focus the updates on places where the data has changed. If your entire cache can be rebuilt within the threshold of acceptable data currency, then it may be easier to do the first option, you can just kick off a rebuild of all the tiles and be done.

If your cache is very large and it is undesirable to rebuild the entire thing, then you need some way to track places that have been edited (for the sake of this discussion, we'll call these "dirty areas"). You can then pass the dirty area polygons into your caching tools to define where tile updates should occur.

So how do you find the dirty areas? One approach is to track them as edits are being made, each transaction can be logged to a database and, at the end of the edit session, the spatial extents of all the transactions can be exported to create a vector dataset of dirty areas.

If real-time tracking of the dataset editing is not an option, you can attempt to compare two datasets directly for attributes or spatial features that do not match. This type of strategy is required when you receive a dataset update without any record of how it was created (such as from a data vendor every six months). It requires that features have at least one key field in common between the two datasets. Comparing attributes is necessary if map symbolization or labeling could change based on a field value.

Accomplishing either of the above solutions in ArcGIS requires custom programming. Fortunately, this problem is common enough that people have posted some scripts and tools online that help address it. The Show Edits Since Reconcile tool, written by Tom Brenneman, compares two versions of an ArcSDE geodatabase and outputs a feature class of spatial discrepancies. It can be installed into your list of toolboxes in ArcCatalog. A similar tool Compare two feature classes in a file geodatabase, written by Sterling Quinn, is designed for those who do not have their data in ArcSDE.

Basing an ArcGIS tile cache update on dirty areas requires some degree of caution. A feature class full of small, adjacent polygons can cause the Manage Map Server Cache Tiles tool to work slowly and inefficiently. If there are a lot of small dirty areas in close proximity, they should be merged before the dirty areas feature class is used to define a caching job.

Automating cache updates

It's common to perform tile cache updates on a regular basis, such as every three months, every week, or every evening. Because caching is so resource-intensive, many server administrators like to build the updated tiles on a staging server and then copy them to their production server. This avoids disruption to those who are viewing tiles on the live website.

Whether you use a staging server or not, it's wise to perform the update during times when the fewest possible individuals will be using your site. For most sites, this is during the early morning hours or the weekend. Since you probably do not want to log in at 2 AM Sunday morning to run your caching tools, it's worth exploring whether your tile caching software can be automated and scheduled to run at given times.

The ArcGIS tools, for example, can be automated using a Python script. Python is a relatively simple programming language to learn, and it can be used to run any ArcGIS tool, including Manage Map Server Cache Tiles. For a full update process, you might decide to chain several tools and functions together in one script, such as:

  • Compare two feature classes in a file geodatabase (to find the dirty areas)
  • Manage Map Server Cache Tiles (passing in the dirty areas feature class to define where tiles are created)
  • A copy function to move the new tiles from the staging server onto the production server

Once you have a script that does everything you need, you can use your operating system to schedule it to run on a regular basis. Task Scheduler, included with Windows, is an example of a program that can run scripts on a repeated basis at any time you specify (such as nights or weekends).

Python scripting with ArcGIS is taught in Penn State's Geog 485: GIS Programming and Software Development. If you're curious to see an example of a Python script that updates a cache, check out the ArcGIS help topic Automating cache creation and updates with geoprocessing.