GEOG 489
Advanced Python Programming for GIS

3.8.4 Adding / removing columns and rows

PrintPrint

Adding a new column to a data frame is very simple when you have the values for that column ready in a list. For instance, in the following example, we want to add a new column ‘m5’ with additional measurements and we already have the numbers stored in a list m5values that is defined in the first line of the example code. To add the column, we then simply make an assignment to df['m5'] in the second line. If a column ‘m5’ would already exist, its values would now be overwritten by the values from m5values. But since this is not the case, a new column gets added under the name ‘m5’ with the values from m5values.

m5values = [0.432523, -0.123223, -0.231232, 0.001231, -0.23698, -0.41231]
df['m5'] = m5values
df
Table 3.3
m1 m2 m3 m4 m5
2017-01-01 1.200000 0.163613 0.510162 0.628612 0.432523
2017-01-02 0.056027 0.056027 0.025050 0.283586 -0.123223
2017-01-03 -0.840010 -0.840010 -0.422343 1.022622 -0.231232
2017-01-04 -0.721431 -0.721431 -0.966351 -0.380911 0.001231
2017-01-05 1.200000 0.655267 -1.339799 1.075069 -0.236980
2017-01-06 0.192804 0.192804 -1.160902 0.525051 -0.412310

For adding new rows, we can simply make assignments to the rows selected via the loc operation, e.g. we could add a new row for January 7, 2017 by writing

df.loc[pd.Timestamp('2017-01-07'),:] = [ ... ]

where the part after the equal sign is a list of five numbers, one for each of the columns. Again, this would replace the values in the case that there already is a row for January 7. The following example uses this idea to create new rows for January 7 to 9 using a for loop:

for i in range(7,10):
    df.loc[ pd.Timestamp('2017-01-0'+str(i)),:] = [ np.random.rand() for j in range(5) ]
df
Table 3.4
m1 m2 m3 m4 m5
2017-01-01 1.200000 0.163613 0.510162 0.628612 0.432523
2017-01-02 0.056027 0.056027 0.025050 0.283586 -0.123223
2017-01-03 -0.840010 -0.840010 -0.422343 1.022622 -0.231232
2017-01-04 -0.721431 -0.721431 -0.966351 -0.380911 0.001231
2017-01-05 1.200000 0.655267 -1.339799 1.075069 -0.236980
2017-01-06 0.192804 0.192804 -1.160902 0.525051 -0.412310
2017-01-07 0.768633 0.559968 0.591466 0.210762 0.610931
2017-01-08 0.483585 0.652091 0.183052 0.278018 0.858656
2017-01-09 0.909180 0.917903 0.226194 0.978862 0.751596

In the body of the for loop, the part on the left of the equal sign uses loc(...) to refer to a row for the new date based on loop variable i, while the part on the right side simply uses the numpy rand() method inside a list comprehension to create a list of five random numbers that will be assigned to the cells of the new row.

If you ever want to remove columns or rows from a data frame, you can do so by using df.drop(...). The first parameter given to drop(...) is a single column or row name or, alternatively, a list of names that should be dropped. By default, drop(...) will consider these as row names. To indicate these are column names that should be removed, you have to specify the additional keyword argument axis=1 . We will see an example of this in a moment.