EARTH 801
Computation and Visualization in the Earth Sciences

Lesson 7: File Editing with awk

PrintPrint

All the examples detailed here are accomplished from your terminal window if you are on a Mac, or from your cygwin window if you are on a PC.

awk

The awk command is a powerful way to manipulate the contents of textfiles. We are only going to skim the surface of what awk can do right now. Let's start with a simple example. Say you have a text file that has some columns of numbers in it. With one awk command you can rearrange the columns in a different order, or perform some arithmetic on the columns.

cartoon Eliza alerting you that instructions come next

Download the file1.txt textfile in order to follow along with what I am doing. This is accomplished by clicking the link to the filename and then choosing Save As . . . from your browser's file menu. Save it somewhere on your computer, then in the terminal, navigate to that place. Remember how? You'll want to use cd.

Enter image and alt text here. No sizes!

This is what "file1.txt" contains. It is simply a plain text three-column six-row arrangement of numbers. The first column is the number 1, the second column is the number 2, the third column is the number 3.

Output certain columns of the file to the screen

Awk is great for quickly manipulating files that are arranged in columns, so it is a nice way to fiddle around with plain-text data files since those are frequently in columns or tables. It uses some peculiar syntax. Let's say we wanted to display just the first column from "file1.txt." Here's how to do it:

awk '{print $1}' file1.txt

The first thing you type is awk and then put single quotes and curly braces. Inside the curly braces we wrote print $1 which is the command to print column #1. The filename from which we are extracting column #1 goes next.

Let's say we wanted to display just the second column from "file1.txt." In that case we'd type:

awk '{print $2}' file1.txt

Quiz yourself!

Rearrange the columns, repeat columns, create other columns

You can output any number of columns and put them in whatever order you want. Let's say we want column 3, then column 1 but not column 2.

awk '{print $3, $1}' file1.txt

The comma between $3 and $1 tells awk to put a space between the columns.

Let's say you want to output column 1, substitute 4's in column 2, then output column 3 unchanged. That would be like this:

awk '{print $1, 4, $3}' file1.txt

The 4 inside the the curly braces doesn't have a $ in front of it because it is the actual number 4, it is not referring to a 4th column.

Quiz Yourself!

Arithmetic, text, special characters

You can do math inside the print statement of awk and you can also deal with columns that aren't numbers. Let's say I want to output the sum of columns 1 and 2 as the first column, my name as the second column, and the product of columns 2 and 3 as the third column, and then make a fourth column that is the number 25:

awk '{print $1+$2, "eliza", $2*$3, 25}' file1.txt

screenshot of 4-column file. 1st column is 3, second column is eliza, third column is 6, 4th is 25

Here's what the output of awk '{print $1+$2, "eliza", $2*$3, 25}' file1.txt looks like.

There are a few special characters. A useful one sometimes is "\t" which tells awk that you want tab spaces in between the columns.

awk '{print $1 "\t" $2 "\t" $3 "\t"}' file1.txt

The command above will output file1.txt unchanged except for tab spaces in between the columns instead of just one space.

Redirect the output of awk

All the examples so far have output the results of the awk command to the screen. They have not altered the original file, and they haven't saved the results anywhere. To put the output of awk into a new file instead of showing it on the screen, use >. Let's say I want to make a new file that is the same as file1.txt but with the columns in reverse order:

awk '{print $3, $2, $1}' file1.txt > file2.txt

Now if I look in the folder where file1.txt is, there are two files. file1.txt is still there, but there is a new file called file2.txt as well.

screenshot of three-column file, columns are the numbers 1,2,3.
screenshot of three column file. column 1 is 3, column 2 is 2, column 3 is 1.

On the left is the original file1.txt.

The command

awk '{print $3, $2, $1}' file1.txt > file2.txt

creates the new file2.txt, seen at right.

Try This!