GEOG 489
Advanced Python Programming for GIS

4.10.2.1 Reading the Input Data

PrintPrint

To study the interplay between the classes and implementation details, let us approach things in the order in which things happen when the main program in main.py is executed. After the main input variables like the paths for the two input files and a dictionary for the indices of the columns in the GPS input file have been defined in lines 15 to 21 of main.py, the first thing that happens is that the data is read in and used to produce objects of classes Bus and Depot for each bus vehicle and depot mentioned in the two input files.

The reading of the input data happens in lines 24 and 25 of main.py.

depotData = Depot.readFromCSV(depotFile)  
busData = Bus.readFromCSV(busFile, busFileColumnIndices)

Both classes Bus and Depot provide class functions called readFromCSV(...) that given a filename read in the data from the respective input file and produce corresponding objects. For class Depot this happens in lines 112 to 120 of core_classes.py and the return value is a simple list of Depot objects created in line 119 with given name string and 4-tuple of numbers for the bounding box.

def readFromCSV(fileName): 
         """reads comma-separated text file with each row representing a depot and returns list of created Depot objects.  
            The order of columns in the file is expected to match the order of parameters and bounding box elements of the Depot class."""  
         depots = [] 
         with open(os.path.join(os.path.dirname(__file__), fileName), "r") as depotFile: 
             csvReader = csv.reader(depotFile, delimiter=',') 
             for row in csvReader:                 # go through rows in input file 
                 depots.append(Depot(row[0], (float(row[1]),float(row[2]),float(row[3]),float(row[4]))))  # add new Depot object for current row to Depot list 
         return depots 

For class Bus, this happens in lines 20 to 49 of core_classes.py and is slightly more involved. It works somewhat similarly to the code from the rhino/race car project in lesson 4 of GEOG485 in that it uses a dictionary to create Timepoint lists for each individual bus vehicle occurring in the data.

def readFromCSV(fileName, columnIndices): 
        """reads comma-separated text file with each row representing a GPS point for a bus with timestamp and returns dictionary mapping 
           bus id to created Bus objects. The column indices for the important info ('lat', 'lon', 'time', 'busID', 'line') need to be  
           provided in dictionary columnIndices."""  
        buses = {} 

        with open(os.path.join(os.path.dirname(__file__), fileName), "r") as trackFile: 
            csvReader = csv.reader(trackFile, delimiter=',') 
            for row in csvReader: 
                # read required info from current row 
                busId = row[columnIndices['busId']] 
                lat = row[columnIndices['lat']] 
                lon = row[columnIndices['lon']] 
                time = row[columnIndices['time']] 
                line = row[columnIndices['line']] 

                # create datetime object from time; we here assume that time in the csv file is given in microseconds since January 1, 1970 
                dt = datetime.datetime(1970, 1, 1) + datetime.timedelta(microseconds=int(time)) 

                # create and add new Bus object if this is the first point for this bus id, else take Bus object from the dictionary
                if not busId in buses: 
                    bus = Bus(busId, line) 
                    buses[busId] = bus      
                else:
                    bus = buses[busId] 

                # create Timepoint object for this row and add it to the bus's Timepoint list 
                bus.timepoints.append(Timepoint(dt,float(lat),float(lon))) 

        	return buses  # return dictionary with Bus objects created

For each row in the csv file processed by the main for-loop in lines 28 to 47, we extract the content of the cells we are interested in, create a new datetime object based on the timestamp in that row, and then, if no bus with that ID is already contained in the dictionary we are maintaining in variable buses, meaning that this is the first GPS point for this bus ID in the file, we create a new Bus object and put it into the dictionary using the bus ID as the key. Else we keep working with the Bus object we have already stored under that ID in the dictionary. In both cases, we then add a new Timepoint object for the data in that row to the list of Timepoints kept in the Bus object (line 47). The dictionary of Bus objects is returned as the return value of the function. Having all Timepoints for a bus nicely stored as a list inside the corresponding Bus object will make it easy for us to look ahead and back in time to determine things like current estimated speed and whether the bus is stopped or driving at a particular point in time.