GEOG 489
Advanced Python Programming for GIS

4.2.3 Priority Queues and Heapq

PrintPrint

The idea of a priority queue is that items in the collection are always kept ordered based on their < relation so that when we take the first item from the queue it will always be the one with lowest (or highest) value.

For instance, let’s get back to the example we started this section with of managing a queue of assignments or tasks that need to be performed. Let’s say that instead of performing the assignments in the order in which they arrive (first-in-first-out), the assignments have a priority value between 1 and 9 with 1 meaning highest and 9 meaning lowest priority. That means we need to make sure we keep the assignments in the queue ordered based on their priority so that taking the first assignment from the queue will be that with the highest priority.

The heapq module among other things provides a set of functions for adding elements to a list (function heappush(…)) and for removing the first item with highest priority (function heappop(…)). In the following code, we again use strings for representing our assignments and encode the priority in the strings themselves so that their lexicographical order corresponds to their priority, i.e. “Assignment 1” < “Assignment 2” < … < “Assignment 9”. The reason we defined the highest priority to be given by the number 1 and the lowest priority by the number 9 is that heapq implements a min heap in which heappop(…) always returns the lowest value element according to the < relation in contrast to a max heap in which heappop(…) would always return the highest value element. The code starts with an empty list in variable pQueue and then simulates the arrival of 100 assignments with random priority using heappush(…) to add a new assignment to the queue.

import heapq 
import random 

pQueue = [] 

for count in range(1,101): 
    priority = random.randint(1,9) 
    heapq.heappush(pQueue, 'Assignment ' + str(priority)) 

print(pQueue) 

When you look at the output produced by the print statement in the last line, you may be disappointed because it doesn’t look like the list is really ordered based on the priority numbers of the assignments. However, the list also does not reflect the order in which the assignments have been added to the queue. The list is actually a “flattened” representation of a binary tree, the data structure that heapq is using to make the push and pop operations as efficient as possible, while making sure that heappop(…) always gives you the lowest value element from the queue.

Now add the following code that calls heappop(…) 100 times to remove all assignments from the queue and print out their names including their priority value:

for count in range(1,101): 
    assignment = heapq.heappop(pQueue) 
    print(assignment)
Output: 

Assignment 1 
Assignment 1 
…
Assignment 2 
…
Assignment 9 

As you can see, by using heappop(…) we indeed get the assignments in the right order from the queue. Of course, this is a simplified example in which we first fill the queue completely and then empty it again, but it works in the same way if we add and remove assignments in any arbitrary order. Using heapq for this task is much, much faster than any simple approach such as always searching through the entire list to find the element with the lowest value or, slightly better, always searching for the correct position when inserting a new assignment into the list to keep the list sorted. If you don't believe it, try to implement your own method and do some profiling to see how it compares to the priority queue based approach. 

In the walkthrough of this lesson, we will employ this notion of a priority queue for keeping a number of bus track GPS observations sorted based on their timestamps. For this, we will have to define the < method for our observation points in a suitable way to work with heapq. This will allow us to process the observation points in chronological order.

This section gave you a bit of a taste of the idea of efficient data structures for collections, the algorithms behind them, and the trade-offs involved (a data structure that is very efficient for certain operations will be suboptimal for other operations). Computer science students spend a lot of time studying the implementation and properties of such data structures and the time and space complexities of the operations involved. We were only able to scratch the surface of this topic here, but, as indicated above, there are many books and other resources on this topic, including some specifically written for Python.