GEOG 489
Advanced Python Programming for GIS

4.2.1 Sets

PrintPrint

Sets are another built-in collection in Python in addition to lists, tuples, and dictionaries. The idea is that of a mathematical set, meaning that there is no order between the elements and an element can only be contained in a set once (in contrast to lists). Sets are mutable like lists or dictionaries.

The following code example shows how we can create a set using curly brackets {…} to delimit the elements (similar to a dictionary but without the : part separating keys from values) and that all duplicates are automatically removed. The comparison between elements is done using the == operator. One restriction of sets is that they can only contain immutable values.

s = {3,4,1,3,4,1} # create set 
print(s) 
Output: 
{1, 3, 4} 

Since sets are unordered, it is not possible to access their elements via an index but we can use the “in” operator to test whether or not a set contains an element as well as use a for-loop to iterate through the elements:

x = 3 
if x in s: 
     print("already contained") 

for e in s:
    print(e)
Output: 
already contained 
1 
3 
4 

One of the nice things about sets is that they provide the standard set theoretical operations union, intersection, etc. as shown in the following code example:

group1 = { "Jim", "Maria", "Frank", "Susan"} 
group2 = { "Sam", "Steve", "Jim" }

print( group1 | group2 )  # or group1.union(group2) 

print( group1 & group2 )  # or group1.intersection(group2) 

print( group1 - group2 )  # or group1.difference(group2)  

print( group1 ^ group2 )  # or group1.symmetric_difference(group2)
Output: 
{'Frank', 'Sam', 'Steve', 'Susan', 'Maria', 'Jim'} 
{'Jim'} 
{'Susan', 'Frank', 'Maria'} 
{'Frank', 'Sam', 'Steve', 'Susan', 'Maria'} 

The difference between the last and second-to-last operation here is that group1 - group2 returns the elements of the set in group1 that are not also elements of group2, while the symmetric difference operation group1 ^ group2 returns a set with all elements that are only contained in one of the groups but not in both.