Python data types

Python contains many data types built in. These include text, numeric, sequence, mapping, sets, Boolean, binary and None. Coding at its core is largely working with data that is in one of these types and it is important to know them and how they work.

As a developer, you need to know types for operations and for parameters and this helps with debugging, performing conditionals, calculations, dictionary operations, and data transformations. For example, the string “1” does not equal the int 1. Trying to add “2” + 5 will result in a TypeError

res = "2" + 5
print (res)

Output 
Traceback (most recent call last): TypeError: can only concatenate str (not "int") to str

Changing the 5 to “5” results in concatenation of the “2” and “5”:

res = "2" + "5"
print (res)

Output 
25

For this course, we will be looking at more advanced structures in the mapping type. W3 provides a great overview of the data types [1] and I encourage you to review them.

Variables: local vs. global, mutable vs. immutable

When making the transition from a beginner to an intermediate or advanced Python programmer, it also gets important to understand the intricacies of variables used within functions and of passing parameters to functions in detail. First of all, we can distinguish between global and local variables within a Python script. Global variables are defined outside of any function or loop construct, or with the keyword global. They can be accessed from anywhere in the script after they are instantiated. They exist and keep their values as long as the script is loaded, which typically means as long as the Python interpreter into which they are loaded is running.

In contrast, local variables are defined inside a function or loop construct and can only be accessed in the body (scope) of that function or loop. Furthermore, when the body of the function has been executed, its local variables will be discarded and cannot be used anymore to access their current values. A local variable is either a parameter of that function, in which case it is assigned a value immediately when the function is called, or it is introduced in the function body by making an assignment to the name for the first time.

Here are a few examples to illustrate the concepts of global and local variables and how to use them in Python.

def doSomething(x):  # parameter x is a local variable of the function 
    count = 1000 * x  # local variable count is introduced 
    return count 
 
y = 10  # global variable y is introduced  
print(doSomething(y)) 
print(count)  # this will result in an error  
print(x)  # this will also result in an error

This example introduces one global variable, y, and two local variables, x and count, both part of the function doSomething(…). x is a parameter of the function, while count is introduced in the body of the function in line 3. When this function is called in line 11, the local variable x is created and assigned the value that is currently stored in global variable y, so the integer number 10. Then the body of the function is executed. In line 3, an assignment is made to variable count. Since this variable hasn’t been introduced in the function body before, a new local variable will now be created and assigned the value 10000. After executing the return statement in line 5, both x and count will be discarded. Hence, the two print statements at the end of the code would lead to errors because they try to access variables that do not exist anymore.

Think of the function as a house with one-way mirrored windows. You can see out, but cannot see in. Variables, can ‘see’ to the outside, but a variable on the outside cannot see 'into' the function. This is also referred to as variable scope. The scope of the variable is where the variable is created and follows the code indents, unless it is decorated with global. Simply, inner scopes can access the outer scopes, but the outer scopes cannot reach into the inner scopes.

Now let’s change the example to the following:

def doSomething():  
    count = 1000 * y    # global variable y is accessed here 
    return count  
y = 10          
print(doSomething())

This example shows that global variable y can also be directly accessed from within the function doSomething(): When Python encounters a variable name that is neither the name of a parameter of that function nor has been introduced via an assignment previously in the body of that function, it will look for that variable among the global (outer scope) variables. However, the first version using a parameter instead is usually preferable because then the code in the function doesn’t depend on how you name and use variables outside of it. That makes it much easier to, for instance, re-use the same function in different projects.

So maybe you are wondering whether it is also possible to change the value of a global variable from within a function, not just read its value? One attempt to achieve this could be the following:

def doSomething():  
    count = 1000  
    y = 5 
    return count * y  
y = 10 
print(doSomething())  
print(y)  # output will still be 10 here

However, if you run the code, you will see that last line still produces the output 10, so the global variable y hasn't been changed by the assignment in line 5. That is because the rule is if the variable is not passed to the function or the variable is not marked as global within the function, it will be considered a local variable to that function. Since this is the first time an assignment to y is made in the body of the function, a new local variable with that name is created at that point that will overshadow the global variable with the same name until the end of the function has been reached. Instead, you explicitly must tell Python that a variable name should be interpreted as the name of a global variable by using the keyword ‘global’, like this:

def doSomething(): 
    count = 1000 
    global y  # tells Python to treat y as the name of global variable 
    y = 5  # as a result, global variable y is assigned a new value here 
    return count * y 
 
 
y = 10 
print(doSomething()) 
print(y)  # output will now be 5 here

In line 5, we are telling Python that y in this function should refer to the global variable y. As a result, the assignment in line 7 changes the value of the global variable called y and the output of the last line will be 5. While it's good to know how these things work in Python, we again want to emphasize that accessing global variables from within functions should be avoided as much as possible. Passing values via parameters and returning values is usually preferable because it keeps different parts of the code as independent of each other as possible.

So after talking about global vs. local variables, what is the issue with mutable vs. immutable mentioned in the heading? There is an important difference in passing values to a function depending on whether the value is from a mutable or immutable data type. All values of primitive data types like numbers and boolean values in Python are immutable, meaning you cannot change any part of them. On the other hand, we have mutable data types like lists and dictionaries for which it is possible to change their parts: You can, for instance, change one of the elements in a list or what is stored under a particular key in a given dictionary without creating a completely new object.

What about strings and tuples? You may think these are mutable objects, but they are actually immutable. While you can access a single character from a string or element from a tuple, you will get an error message if you try to change it by using it on the left side of the equal sign in an assignment. Moreover, when you use a string method like replace(…) to replace all occurrences of a character by another one, the method cannot change the string object in memory for which it was called but has to construct a new string object and return that to the caller.

Why is that important to know in the context of writing functions? Because mutable and immutable data types are treated differently when provided as a parameter to functions as shown in the following two examples:

def changeIt(x): 
    x = 5  # this does not change the value assigned to y because x is considered local 
 
 
y = 3 
changeIt(y) 
print(y)  # will print out 3

As we already discussed above, the parameter x is treated as a local variable in the function body. We can think of it as being assigned a copy of the value that variable y contains when the function is called. As a result, the value of the global variable y doesn’t change and the output produced by the last line is 3. But it only works like this for immutable objects, like numbers in this case! Let’s do the same thing for a list:

def changeIt(x): 
    x[0] = 5  # this will change the list y refers to 
 
y = [3, 5, 7] 
changeIt(y) 
print(y)  # output will be [5, 5, 7]

The output [5, 5, 7] produced by the print statement in the last line shows that the assignment in line 3 changed the list object that is stored in global variable y. How is that possible? Well, for values of mutable data types like lists, assigning the value to function parameter x cannot be conceived as creating a copy of that value and, as a result, having the value appear twice in memory. Instead, x is set up to refer to the same list object in memory as y. Therefore, any change made with the help of either variable x or y will change the same list object in memory. When variable x is discarded when the function body has been executed, variable y will still refer to that modified list object. Maybe you have already heard the terms “call-by-value” and “call-by-reference” in the context of assigning values to function parameters in other programming languages. What happens for immutable data types in Python works like “call-by-value,” while what happens to mutable data types works like “call-by-reference.” If you feel like learning more about the details of these concepts, check out this article on Parameter Passing [2].

While the reasons behind these different mechanisms are very technical and related to efficiency, this means it is actually possible to write functions that take parameters of mutable type as input and modify their content. This is common practice (in particular for class objects which are also mutable) and not generally considered bad style because it is based on function parameters and the code in the function body does not have to know anything about what happens outside of the function. Nevertheless, often returning a new object as the return value of the function rather than changing a mutable parameter is preferable. This brings us to the last part of this section.

Lesson content developed by Jan Wallgrun and James O’Brien

Python Dictionaries

In programming, we often want to store larger amounts of data that somehow belongs together inside a single variable. You probably already know about lists, which provide one option to do so. As long as available memory permits, you can store as many elements in a list as you wish and the append(...) method allows you to add more elements to an existing list.

Dictionaries are another data structure that allows for storing complex information in a single variable. While lists store elements in a simple sequence and the elements are then accessed based on their index in the sequence, the elements stored in a dictionary consist of key-value pairs and one always uses the key to retrieve the corresponding values from the dictionary. It works like in a real dictionary where you look up information (the stored value) under a particular keyword (the key). Similar to real dictionaries, there can only be one unique keyword (key), but can have multiple values attached to it. Values can be simple data structures such as strings, ints, lists, dictionaries, or they can be more complex objects such as featureclasses, classes, and even functions.

Dictionaries can be useful to realize a mapping, for instance from English words to the corresponding words in Spanish. Here is how you can create such a dictionary for just the numbers from one to four:

englishToSpanishDic = {"one": "uno", "two": "dos", "three": "tres", "four": "cuatro"}

The curly brackets { } delimit the dictionary similarly to how squared brackets [ ] do for lists. Inside the dictionary, we have four key-value pairs separated by commas. The key and value for each pair are separated by a colon. The key appears on the left of the colon, while the value stored under the key appears on the right side of the colon.

We can now use the dictionary stored in variable englishToSpanishDic to look up the Spanish word for an English number, e.g.

print(englishToSpanishDic["two"])

Output 
dos

To retrieve some value stored in the dictionary, we here use the name of the variable followed by squared brackets containing the key under which the value is stored in the dictionary. There are also built-in methods that you can use to avoid some of the Exceptions raised (such as KeyError if the key does not exist in the dictionary). This method (.get()) has been used in the previous lesson examples.

We can add a new key-value pair to an existing dictionary by using the dictionanary[key] notation but on the left side of an assignment operator (=):

englishToSpanishDic["five"] = "cinco" 
print(englishToSpanishDic)

Output 
{'four': 'cuatro', 'three': 'tres', 'five': 'cinco', 'two': 'dos', 'one': 'uno'}

Here we added the value "cinco" appearing on the right side of the equal sign under the key "five" to the dictionary. If something would have already been stored under the key "five" in the dictionary, the stored value would have been overwritten. You may have noticed that the order of the elements of the dictionary in the output has changed but that doesn’t matter since we always access the elements in a dictionary via their key. If our dictionary would contain many more word pairs, we could use it to realize a very primitive translator that would go through an English text word-by-word and replace each word by the corresponding Spanish word retrieved from the dictionary.

Now let’s use Python dictionaries to do something a bit more complex. Let’s simulate the process of creating a book index that lists the page numbers on which certain keywords occur. We want to start with an empty dictionary and then go through the book page-by-page. Whenever we encounter a word that we think is important enough to be listed in the index, we add it and the page number to the dictionary.

To create an empty dictionary in a variable called bookIndex, we use the notation with the curly brackets but nothing in between:

bookIndex = {} 
print(bookIndex)

Output 
{}

Now let’s say the first keyword we encounter in the imaginary programming book we are going through is the word "function" on page 2. We now want to store the page number 2 (value) under the keyword "function" (key) in the dictionary. But since keywords can appear on many pages, what we want to store as values in the dictionary are not individual numbers but lists of page numbers. Therefore, what we put into our dictionary is a list with the number 2 as its only element:

bookIndex["function"] =  [2] 
print (bookIndex)

Output 
{'function': [2]}

Next, we encounter the keyword "module" on page 3. So we add it to the dictionary in the same way:

bookIndex["module"] =  [3] 
print (bookIndex)

Output 
{'function': [2], 'module': [3]}

So now our dictionary contains two key-value pairs and for each key it stores a list with just a single page number. Let’s say we next encounter the keyword “function” a second time, this time on page 5. Our code to add the additional page number to the list stored under the key “function” now needs to look a bit differently because we already have something stored for it in the dictionary and we do not want to overwrite that information. Instead, we retrieve the currently stored list of page numbers and add the new number to it with append(…):

pages = bookIndex["function"] 
pages.append(5) 
print(bookIndex) 
>>> {'function': [2, 5], 'module': [3]} 
print(bookIndex["function"]) 
>>> [2, 5]

Please note that we didn’t have to put the list of page numbers stored in variable pages back into the dictionary after adding the new page number. Both, variable pages and the dictionary refer to the same list such that appending the number changes both. Our dictionary now contains a list of two page numbers for the key “function” and still a list with just one page number for the key “module”. Surely you can imagine how we would build up a large dictionary for the entire book by continuing this process. Dictionaries can be used in concert with a for loop to go through the keys of the elements in the dictionary. This can be used to print out the content of an entire dictionary:

for k in bookIndex.keys():  # loop through keys of the dictionary 
    print("keyword: " + k)  # print the key 
    print("pages: " + str(bookIndex[k]))  # print the value

Output 
keyword: function 
pages: [2, 5] 
keyword: module 
pages: [3]

When adding the second page number for “function”, we ourselves decided that this needs to be handled differently than when adding the first page number. But how could this be realized in code? We can check whether something is already stored under a key in a dictionary using an if-statement together with the “in” operator:

keyword = "function" 
if keyword in bookIndex.keys():
    ("entry exists") 
else:
    print ("entry does not exist")

Output 
entry exists

So assuming we have the current keyword stored in variable word and the corresponding page number stored in variable pageNo, the following piece of code would decide by itself how to add the new page number to the dictionary:

if word in bookIndex:
    # entry for word already exists, so we just add page
    pages = bookIndex[word]
    pages.append(pageNo)
else:
    # no entry for word exists, so we add new entry
    bookIndex[word] = [pageNo]

This can also be written as a ternary operation as discussed in the previous lesson, however the order of the operation needs to be careful that it doesn’t trigger a KeyError by checking if the value is in the keys first:

bookIndex[word] = [pageNo] if word not in bookIndex.keys() else bookIndex[word].append(word)

A more sophisticated version of this code would also check whether the list of page numbers retrieved in the if-block already contains the new page number to deal with the case that a keyword occurs more than once on the same page. Feel free to think about how this could be included.

Lesson content developed by Jan Wallgrun and James O’Brien

JSON

JSON, which is an acronym for Javascript Serializion Object Notation, is a data structure mostly associated with the web. It provides a dictionary like structure that is easily created, transmitted, read, and parsed. Many coding languages include built in methods for these processes and it is largely used for transferring data and information between languages. For example, a web API on a server may be written in C# and the front-end application written in Javascript. The API request from Javascript may be serialized to JSON and then deserialized by the C# API. C# continues to process the request and responds with return data in JSON form. Python includes a JSON package aptly named json that makes the working with JSON and disctionaries easy.

An example of JSON is:

{ 
  "objectIdFieldName" : "OBJECTID",  
  "uniqueIdField" :  
  { 
    "name" : "OBJECTID",  
    "isSystemMaintained" : true 
  },  
  "globalIdFieldName" : "GlobalID",  
  "geometryType" : "esriGeometryPoint",  
  "spatialReference" : { 
    "wkid" : 102100,  
    "latestWkid" : 3857 
  },  
  "fields" : [ 
    { 
      "name" : "OBJECTID",  
      "type" : "esriFieldTypeOID",  
      "alias" : "OBJECTID",  
      "sqlType" : "sqlTypeOther",  
      "domain" : null,  
      "defaultValue" : null 
    },  
    { 
      "name" : "Acres",  
      "type" : "esriFieldTypeDouble",  
      "alias" : "Acres",  
      "sqlType" : "sqlTypeOther",  
      "domain" : null,  
      "defaultValue" : null 
    } 
  ],  
  "features" : [ 
    { 
      "attributes" : { 
        "OBJECTID" : 6,  
        "GlobalID" : "c4c4bcfd-ce86-4bc4-b9a2-ac7b75027e12",  
        "Contained" : "Yes",  
        "FireName" : "66",  
        "Responsibility" : "Local",  
        "DPA" : "LRA",  
        "StartDate" : 1684177800000,  
        "Status" : "Out",  
        "PercentContainment" : 50,  
        "Acres" : 127,  
        "InciWeb" : "https://www.fire.ca.gov/incidents/2023/5/15/66-fire",  
        "SocialMediaHyperlink" : "https://twitter.com/hashtag/66fire?src=hashtag_click",  
        "StartTime" : "1210",  
        "ImpactBLM" : "No",  
        "FireNotes" : "Threats exist to structures, critical infrastructure, and agricultural ands. High temperatures, and low humidity. Forward spread stopped. Resources continue to strengthen ontainment lines.",  
        "CameraLink" : null 
      },  
      "geometry" :  
      { 
        "x" : -12923377.992696606,  
        "y" : 3971158.6933410829 
      } 
    },  
    { 
      "attributes" : { 
        "OBJECTID" : 7,  
        "GlobalID" : "04772c32-acab-4f12-8875-7f456e21eda7",  
        "Contained" : "No",  
        "FireName" : "Ramona",  
        "Responsibility" : "LRA",  
        "DPA" : "Local",  
        "StartDate" : 1684789860000,  
        "Status" : "Out",  
        "PercentContainment" : 80,  
        "Acres" : 348,  
        "InciWeb" : "https://www.fire.ca.gov/incidents/2023/5/22/ramona-fire/",  
        "SocialMediaHyperlink" : "https://twitter.com/hashtag/ramonafire?src=hashtag_click",  
        "StartTime" : null,  
        "ImpactBLM" : "Possible",  
        "FireNotes" : "Minimal fire behavior observed. Resources continue to strengthen control ines and mop-up.",  
        "CameraLink" : "https://alertca.live/cam-console/2755" 
      },  
      "geometry" :  
      { 
        "x" : -13029408.882657332,  
        "y" : 4003241.0902095754 
      } 
    },  
    { 
      "attributes" : { 
        "OBJECTID" : 8,  
        "GlobalID" : "737be4e4-a127-486a-8481-a0ca62a631d7",  
        "Contained" : "Yes",  
        "FireName" : "Range",  
        "Responsibility" : "State",  
        "DPA" : "State",  
        "StartDate" : 1685925480000,  
        "Status" : "Out",  
        "PercentContainment" : 100,  
        "Acres" : 72,  
        "InciWeb" : "https://www.fire.ca.gov/incidents/2023/6/4/range-fire",  
        "SocialMediaHyperlink" : "https://twitter.com/hashtag/RangeFire?src=hashtag_click",  
        "StartTime" : null,  
        "ImpactBLM" : "No",  
        "FireNotes" : null,  
        "CameraLink" : "https://alertca.live/cam-console/2731" 
      },  
      "geometry" :  
      { 
        "x" : -13506333.475177869,  
        "y" : 4366169.4120039716 
      } 
    } 
] 
}

Where the left side of the : is the property, and the right side is the value, much like the dictionary. It is important to note that while it looks like a python dictionary, JSON needs to be converted to a dictionary for it to be recognized as a dictionary and vice versa to JSON. One main difference between dictionaries and JSON is that JSON is that the properties (keys in Python) need to be strings whereas Python dictionary keys can be ints, floats, strings, Booleans or other immutable types.

Many API’s will transmit the requested data in JSON form and conversion is simple as using JSON.loads() to convert to JSON to a python dictionary and JSON.dumps() to convert it to a JSON object. We will be covering more details of this process in Lesson 2.

Classes

Let’s recapitulate a bit: the underlying perspective of object-oriented programming is that the domain modeled in a program consists of objects belonging to different classes. If your software models some part of the real world, you may have classes for things like buildings, vehicles, trees, etc. and then the objects (also called instances) created from these classes during run-time represent concrete individual buildings, vehicles, or trees with their specific properties. The classes in your software can also describe non real-world and often very abstract things like a feature layer or a random number generator.

Class definitions specify general properties that all objects of that class have in common, together with the things that one can do with these objects. Therefore, they can be considered blueprints for the objects. Each object at any moment during run-time is in a particular state that consists of the concrete values it has for the properties defined in its class. So, for instance, the definition of a very basic class Car may specify that all cars have the properties owner, color, currentSpeed, and lightsOn. During run-time we might then create an object for “Tom’s car” in variable carOfTom with the following values making up its state:

carOfTom.owner = "Tom" 
carOfTom.color = "blue" 
carOfTom.currentSpeed = 48   (mph) 
carOfTom.lightsOn = False

While all objects of the same class have the same properties (also called attributes or fields), their values for these properties may vary and, hence, they can be in different states. The actions that one can perform with a car or things that can happen to a car are described in the form of methods in the class definition. For instance, the class Car may specify that the current speed of cars can be changed to a new value and that lights can be turned on and off. The respective methods may be called changeCurrentSpeed(…), turnLightsOn(), and turnLightsOff(). Methods are like functions but they are explicitly invoked on an object of the class they are defined in. In Python this is done by using the name of the variable that contains the object, followed by a dot, followed by the method name:

carOfTom.changeCurrentSpeed(34) # change state of Tom’s car to current speed being 34mph 

carOfTom.turnLightsOn()# change state of Tom’s car to lights being turned on

The purpose of methods can be to update the state of the object by changing one or several of its properties as in the previous two examples. It can also be to get information about the state of the car, e.g. are the lights turned on? But it can also be something more complicated, e.g. performing a certain driving maneuver or fuel calculation.

In object-oriented programming, a program is perceived as a collection of objects that interact by calling each other’s methods. Object-oriented programming adheres to three main design principles:

Encapsulation: Definitions related to the properties and methods of any class appear in a specification that is encapsulated independently from the rest of the software code and properties are only accessible via a well-defined interface, e.g. via the defined methods.
Inheritance: Classes can be organized hierarchically with new classes being derived from previously defined classes inheriting all the characteristics of the parent class but potentially adding specialized properties or specialized behavior. For instance, our class Car could be derived from a more general class Vehicle adding properties and methods that are specific for cars.
Polymorphism: Inherited classes can change the behavior of methods by overwriting them and the code executed when such a method is invoked for an object then depends on the class of that object.

We will talk more about inheritance and polymorphism soon. All three principles aim at improving reusability and maintainability of software code. These days, most software is created by mainly combining parts that already exist because that saves time and costs and increases reliability when the re-used components have already been thoroughly tested. The idea of classes as encapsulated units within a program increases reusability because these units are then not dependent on other code and can be moved over to a different project much more easily.

For now, let’s look at how our simple class Car can be defined in Python.

 class Car(): 

     def __init__(self): 
          self.owner = 'UNKNOWN' 
          self.color = 'UNKNOWN' 
          self.currentSpeed = 0 
          self.lightsOn = False 

     def changeCurrentSpeed(self,newSpeed): 
          self.currentSpeed = newSpeed 

     def turnLightsOn(self): 
          self.lightsOn = True 

     def turnLightsOff(self): 
          self.lightsOn = False 

     def printInfo(self): 
          print('Car with owner = {0}, color = {1}, currentSpeed = {2}, lightsOn = {3}'.format(self.owner, self.color, self.currentSpeed, self.lightsOn))

Here is an explanation of the different parts of this class definition: each class definition in Python starts with the keyword ‘class’ followed by the name of the class (‘Car’) followed by parentheses that may contain names of classes that this class inherits from, but that’s something we will only see later on. The rest of the class definition is indented to the right relative to this line.

The rest of the class definition consists of definitions of the methods of the class which all look like function definitions but have the keyword ‘self’ as the first parameter, which is an indication that this is a method. The method __init__(…) is a special method called the constructor of the class. It will be called when we create a new object of that class like this:

carOfTom = Car()    # uses the __init__() method of Car to create a new Car object

In the body of the constructor, we create the properties of the class Car. Each line starting with “self.<name of property> = ...“ creates a so-called instance variable for this car object and assigns it an initial value, e.g. zero for the speed. The instance variables describing the state of an object are another type of variable in addition to global and local variables that you already know. They are part of the object and exist as long as that object exists. They can be accessed from within the class definition as “self.<name of the instance variable>” which happens later in the definitions of the other methods, namely in lines 10, 13, 16 and 19. If you want to access an instance variable from outside the class definition, you have to use <name of variable containing the object>.<name of the instance variable>, so, for instance:

print(carOfTom.lightsOn)    # will produce the output False because right now this instance variable still has its default value

The rest of the class definition consists of the methods for performing certain actions with a Car object. You can see that the already mentioned methods for changing the state of the Car object are very simple. They just assign a new value to the respective instance variable, a new speed value that is provided as a parameter in the case of changeCurrentSpeed(…) and a fixed Boolean value in the cases of turnLightsOn() and turnLightsOff(). In addition, we added a method printInfo() that prints out a string with the values of all instance variables to provide us with all information about a car’s current state. Let us now create a new instance of our Car class and then use some of its methods:

carOfSue = Car() 
carOfSue.owner = 'Sue' 
carOfSue.color = 'white' 
carOfSue.changeCurrentSpeed(41) 
carOfSue.turnLightsOn() 
carOfSue.printInfo()

Output

Car with owner = Sue, color = white, currentSpeed = 41, lightsOn = True

Since we did not define any methods to change the owner or color of the car, we are directly accessing these instance variables and assigning new values to them in lines 2 and 3. While this is okay in simple examples like this, it is recommended that you provide so-called getter and setter methods (also called accessor and mutator methods) for all instance variables that you want the user of the class to be able to read (“get”) or change (“set”). The methods allow the class to perform certain checks to make sure that the object always remains in an allowed state. How about you go ahead and for practice create a second car object for your own car (or any car you can think of) in a new variable and then print out its information?

A method can call any other method defined in the same class by using the notation “self.<name of the method>(...)”. For example, we can add the following method randomSpeed() to the definition of class Car:

def setRandomSpeed(self): 
    self.changeCurrentSpeed(random.randint(0,76))

The new method requires the “random” module to be imported at the beginning of the script. The method generates a random number and then uses the previously defined method changeCurrentSpeed(…) to actually change the corresponding instance variable. In this simple example, one could have simply changed the instance variable directly but in more complex cases changes to the state can require more code so that this approach here actually avoids having to repeat that code. Give it a try and add some lines to call this new method for one of the car objects and then print out the info again.

Lesson content developed by Jan Wallgrun and James O’Brien

Inheritance, class hierarchies, and polymorphism

We already mentioned building class hierarchies via inheritance and polymorphism as two main principles of object-oriented programming in addition to encapsulation. To introduce you to these concepts, let us start with another exercise in object-oriented modeling and writing classes in Python. Imagine that you are supposed to write a very basic GIS or vector drawing program that only deals with geometric features of three types: circles, and axis-aligned rectangles and squares. You need the ability to store and manage an arbitrary number of objects of these three kinds and be able to perform simple operations with these objects like computing their area and perimeter and moving the objects to a different position. How would you write the classes for these three kinds of geometric objects?

Let us start with the class Circle: a circle in a two-dimensional coordinate system is typically defined by three values, the x and y coordinates of the center of the circle and its radius. So these should become the properties (= instance variables) of our Circle class and for computing the area and perimeter, we will provide two methods that return the respective values. The method for moving the circle will take the values by how much the circle should be moved along the x and y axes as parameters but not return anything.

import math  

class Circle():  
    def __init__(self, x = 0.0, y = 0.0, radius = 1.0):  
        self.x = x  
        self.y = y  
        self.radius = radius  

    def computeArea(self):  
        return math.pi * self.radius ** 2 

    def computePerimeter (self):  
        return 2 * math.pi * self.radius  

    def move(self, deltaX, deltaY):  
        self.x += deltaX  
        self.y += deltaY 

    def __str__(self):  
        return 'Circle with coordinates {0}, {1} and radius {2}'.format(self.x, self.y, self.radius)

In the constructor, we have keyword arguments with default values for the three properties of a circle and we assign the values provided via these three parameters to the corresponding instance variables of our class. We import the math module of the Python standard library so that we can use the constant math.pi for the computations of the area and perimeter of a circle object based on the instance variables. Finally, we add the __str__() method to produce a string that describes a circle object with its properties. It should by now be clear how to create objects of this class and, for instance, apply the computeArea() and move(…) methods.

circle1 = Circle(10,4,3) 
print(circle1) 
print(circle1.computeArea()) 
circle1.move(3,-1) 
print(circle1)

Output
Circle with coordinates 10, 4 and radius 3 
28.274333882308138 
Circle with coordinates 13, 3 and radius 3

How about a similar class for axis-aligned rectangles? Such rectangles can be described by the x and y coordinates of one of their corners together with width and height values, so four instance variables taking numeric values in total. Here is the resulting class and a brief example of how to use it:

class Rectangle(): 
	def __init__(self, x = 0.0, y = 0.0, width = 1.0, height = 1.0): 
		self.x = x 
		self.y = y 
		self.width = width 
		self.height = height 

    def computeArea(self): 
		return self.width * self.height 

    def computePerimeter (self): 
		return 2 * (self.width + self.height) 

    def move(self, deltaX, deltaY): 
		self.x += deltaX 
		self.y += deltaY 

	def __str__(self): 
		return 'Rectangle with coordinates {0}, {1}, width {2} and height {3}'.format(self.x, self.y, self.width, self.height ) 

rectangle1 = Rectangle(10,10,3,2) 
print(rectangle1) 
print(rectangle1.computeArea()) 
rectangle1.move(2,2) 
print(rectangle1)

Output
Rectangle with coordinates 10, 10, width 3 and height 2 
6 
Rectangle with coordinates 12, 12, width 3 and height 2

There are a few things that can be observed when comparing the two classes Circle and Rectangle we just created: the constructors obviously vary because circles and rectangles need different properties to describe them and, as a result, the calls when creating new objects for the two classes also look different. All the other methods have exactly the same signature, meaning the same parameters and the same kind of return value; just the way they are implemented differs. That means the different calls for performing certain actions with the objects (computing the area, moving the object, printing information about the object) also look exactly the same; it doesn’t matter whether the variable contains an object of class Circle or of class Rectangle. If you compare the two versions of the move(…) method, you will see that these even do not differ in their implementation, they are exactly the same!

This all is a clear indication that we are dealing with two classes of objects that could be seen as different specializations of a more general class for geometric objects. Wouldn’t it be great if we could now write the rest of our toy GIS program managing a set of geometric objects without caring whether an object is a Circle or a Rectangle in the rest of our code? And, moreover, be able to easily add classes for other geometric primitives without making any changes to all the other code, and in their class definitions only describe the things in which they differ from the already defined geometry classes? This is indeed possible by arranging our geometry classes in a class hierarchy starting with an abstract class for geometric objects at the top and deriving child classes for Circle and Rectangle from this class with both adding their specialized properties and behavior. Let’s call the top-level class Geometry. The resulting very simple class hierarchy is shown in the figure below.

Flowchart with "Geometry" leading to "Circle" and "Rectangle."

Figure 4.17 Simple class hierarchy with three classes. Classes Circle and Rectangle are both derived from parent class Geometry.

Inheritance allows the programmer to define a class with general properties and behavior and derive one or more specialized subclasses from it that inherit these properties and behavior but also can modify them to add more specialized properties and realize more specialized behavior. We use the terms derived class and base class to refer to the two classes involved when one class is derived from another.

Lesson content developed by Jan Wallgrun and James O’Brien

Implementing the class hierarchy

Let’s change our example so that both Circle and Rectangle are derived from such a general class called Geometry. This class will be an abstract class in the sense that it is not intended to be used for creating objects from. Its purpose is to introduce properties and templates for methods that all geometric classes in our project have in common.

class Geometry():  

    def __init__(self, x = 0.0, y = 0.0):  
        self.x = x  
        self.y = y  

    def computeArea(self):  
        pass 

    def computePerimeter(self):  
        pass 

    def move(self, deltaX, deltaY):  
        self.x += deltaX  
        self.y += deltaY  

    def __str__(self):  
        return 'Abstract class Geometry should not be instantiated and derived classes should override this method!'

The constructor of class Geometry looks pretty normal, it just initializes the instance variables that all our geometry objects have in common, namely x and y coordinates to describe their location in our 2D coordinate system. This is followed by the definitions of the methods computeArea(), computePerimeter(), move(…), and __str__() that all geometry objects should support. For move(…), we can already provide an implementation because it is entirely based on the x and y instance variables and works in the same way for all geometry objects. That means the derived classes for Circle and Rectangle will not need to provide their own implementation. In contrast, you cannot compute an area or perimeter in a meaningful way just from the position of the object. Therefore, we used the keyword pass to indicate that we are leaving the body of the computeArea() and computePerimeter() methods intentionally empty. These methods will have to be overridden in the definitions of the derived classes with implementations of their specialized behavior. We could have done the same for __str__() but instead we return a warning message that this class should not have been instantiated.

It is worth mentioning that, in many object-oriented programming languages, the concepts of an abstract class (= a class that cannot be instantiated) and an abstract method (= a method that must be overridden in every subclass that can be instantiated) are built into the language. That means there exist special keywords to declare a class or method to be abstract and then it is impossible to create an object of that class or a subclass of it that does not provide an implementation for the abstract methods. In Python, this has been added on top of the language via a module in the standard library called abc [3] (for abstract base classes). Although we won’t be using it in this course, it is a good idea to check it out and use it if you get involved in larger Python projects. This Abstract Classes page [4] is a good source for learning more.

Here is our new definition for class Circle that is now derived from class Geometry. We also use a few commands at the end to create and use a new Circle object of this class to make sure everything is indeed working as before:

import math  

class Circle(Geometry): 

	def __init__(self, x = 0.0, y = 0.0, radius = 1.0): 
		super(Circle,self).__init__(x,y) 
		self.radius = radius 

	def computeArea(self): 
		return math.pi * self.radius ** 2 

	def computePerimeter (self): 
		return 2 * math.pi * self.radius 

	def __str__(self): 
		return 'Circle with coordinates {0}, {1} and radius {2}'.format(self.x, self.y, self.radius) 

circle1 = Circle(10, 10, 10) 
print(circle1.computeArea()) 
print(circle1.computePerimeter()) 
circle1.move(2,2) 
print(circle1)

Here are the things we needed to do in the code:

In line 3, we had to change the header of the class definition to include the name of the base class we are deriving Circle from (‘Geometry’) within the parentheses.
The constructor of Circle takes the same three parameters as before. However, it only initializes the new instance variable radius in line 7. For initializing the other two variables it calls the constructor of its base class, so the class Geometry, in line 6 with the command “super(Circle,self).__init__(x,y)”. This is saying “call the constructor of the base class of class Circle and pass the values of x and y as parameters to it”. It is typically a good idea to call the constructor of the base class as the first command in the constructor of the derived class so that all general initializations are taken care off.
Then we provide definitions of computeArea() and computePerimeter() that are specific for circles. These definitions override the “empty” definitions of the Geometry base class. This means whenever we invoke computeArea() or computePerimeter() for an object of class Circle, the code from these specialized definitions will be executed.
Note that we do not provide any definition for method move(…) in this class definition. That means when move(…) will be invoked for a Circle object, the code from the corresponding definition in its base class Geometry will be executed.
We do override the __str__() method to produce the same kind of string with information about all instance variables that we had in the previous definition. Note that this function accesses both the instance variables defined in the parent class Geometry as well as the additional one added in the definition of Circle.

The new definition of class Rectangle, now derived from Geometry, looks very much the same as that of Circle if you replace “Circle” with “Rectangle”. Only the implementations of the overridden methods look different, using the versions specific for rectangles.

class Rectangle(Geometry): 

	def __init__(self, x = 0.0, y = 0.0, width = 1.0, height = 1.0): 
		super(Rectangle, self).__init__(x,y) 
		self.width = width 
        self.height = height 

	def computeArea(self): 
		return self.width * self.height 

	def computePerimeter (self): 
		return 2 * (self.width + self.height) 

	def __str__(self): 
		return 'Rectangle with coordinates {0}, {1}, width {2} and height {3}'.format(self.x, self.y, self.width, self.height ) 

rectangle1 = Rectangle(15,20,4,5) 
print(rectangle1.computeArea()) 
print(rectangle1.computePerimeter()) 
rectangle1.move(2,2) 
print(rectangle1)

Lesson content developed by Jan Wallgrun and James O’Brien

Class attributes and static class functions

In this section we are going to look at two additional concepts that can be part of a class definition, namely class variables/attributes and static class functions. We will start with class attributes even though it is the less important one of these two concepts and won't play a role in the rest of this lesson. Static class functions, on the other hand, will be used in the walkthrough code of this lesson and also will be part of the homework assignment.

We learned in this lesson that for each instance variable defined in a class, each object of that class possesses its own copy so that different objects can have different values for a particular attribute. However, sometimes it can also be useful to have attributes that are defined only once for the class and not for each individual object of the class. For instance, if we want to count how many instances of a class (and its subclasses) have been created while the program is being executed, it would not make sense to use an instance variable with a copy in each object of the class for this. A variable existing at the class level is much better suited for implementing this counter and such variables are called class variables or class attributes. Of course, we could use a global variable for counting the instances but the approach using a class attribute is more elegant as we will see in a moment.

The best way to implement this instance counter idea is to have the code for incrementing the counter variable in the constructor of the class because that means we don’t have to add any other code and it’s guaranteed that the counter will be increased whenever the constructor is invoked to create a new instance. The definition of a class attribute in Python looks like a normal variable assignment but appears inside a class definition outside of any method, typically before the definition of the constructor. Here is what the definition of a class attribute counter for our Geometry class could look like. We are adding the attribute to the root class of our hierarchy so that we can use it to count how many geometric objects have been created in total.

class Geometry(): 
   counter = 0 

   def __init__(self, x = 0.0, y = 0.0): 
      self.x = x 
      self.y = y 
      Geometry.counter += 1 
…

The class attribute is defined in line 2 and the initial value of zero is assigned to it when the class is loaded so before the first object of this class is created. We already included a modified version of the constructor that increases the value of counter by one. Since each constructor defined in our class hierarchy calls the constructor of its base class, the counter class attribute will be increased for every geometry object created. Please note that the main difference between class attributes and instance variables in the class definition is that class attributes don’t use the prefix “self.” but the name of the class instead, so Geometry.counter in this case. Go ahead and modify your class Geometry in this way, while keeping all the rest of the code unchanged.

While instance variables can only be accessed for an object, e.g. using <variable containing the object>.<name of the instance variable>, we can access class attributes by using the name of the class, i.e. <name of the class>.<name of the class attribute>. That means you can run the code and use the statement

print(Geometry.counter)

… to get the value currently stored in this new class attribute. Since we have not created any geometry objects since making this change, the output should be 0.

Let’s now create two geometry objects of different types, for instance, a circle and a square:

Circle(10,10,10) 
Square(5,5,8)

Now run the previous print statement again and you will see that the value of the class variable is now 2. Class variables like this are suitable for storing all information related to the class, so essentially everything that does not describe the state of individual objects of the class.

Class definitions can also contain definitions of functions that are not methods, meaning they are not invoked for a specific object of that class and they do not access the state of a particular object. We will refer to such functions as static class functions. Like class attributes they will be referred to from code by using the name of the class as prefix. Class functions allow for implementing some functionality that is in some way related to the class but not the state of a particular object. They are also useful for providing auxiliary functions for the methods of the class. It is important to note that since static class functions are associated with the class but not an individual object of the class, you cannot directly refer to the instance variables in the body of a static class function like you can in the definitions of methods. However, you can refer to class attributes as you will see in a moment.

A static class function definition can be distinguished from the definition of a method by the lack of the “self” as the first parameter of the function; so it looks like a normal function definition but is located inside a class definition. To give a very simple example of a static class function, let’s add a function called printClassInfo() to class Geometry that simply produces a nice output message for our counter class attribute:

class Geometry(): 
    … 

    def printClassInfo(): 
        print( "So far, {0} geometric objects have been created".format(Geometry.counter) )

We have included the header of the class definition to illustrate how the definition of the function is embedded into the class definition. You can place the function definition at the end of the class definition, but it doesn’t really matter where you place it, you just have to make sure not to paste the code into the definition of one of the methods. To call the function you simply write:

Geometry.printClassInfo()

The exact output depends on how many objects have been created but it will be the current value of the counter class variable inserted into the text string from the function body.

Go ahead and save your completed geometry script since we'll be using it later in this lesson.

In the program that we will develop in the walkthroughs of this lesson, we will use static class functions that work somewhat similarly to the constructor in that they can create and return new objects of the class but only if certain conditions are met. We will use this idea to create event objects for certain events detected in bus GPS track data. The static functions defined in the different bus event classes (called detect()) will be called with the GPS data and only return an object of the respective event class if the conditions for this kind of bus event are fulfilled. Here is a sketch of a class definition that illustrates this idea:

class SomeEvent(): 
    ...

    # static class function that creates and returns an object of this class only if certain conditions are satisfied
    def detect(data): 
        ... # perform some tests with data provided as parameter
        if ...: # if conditions are satisfied, use constructor of SomeEvent to create an object and return that object
              return SomeEvent(...)
        else:   # else the function returns None
              return None

# calling the static class function from outside the class definition,
# the returned SomeEvent object will be stored in variable event
event = SomeEvent.detect(...)
if event: # test whether an object has been returned
    ... # do something with the new SomeEvent object

Lesson content developed by Jan Wallgrun and James O’Brien