EARTH 801
Computation and Visualization in the Earth Sciences

Lesson 7: File Input/Output with Data

PrintPrint

Three examples of reading in a file, doing something with its contents, plotting the result.

New syntax: loadStrings, split, map, log

0. Make a scatter plot from data in a plain text file

The box below contains the contents of a plain text file named "cards_data.txt" that I created using vi and have dragged and dropped onto my sketch. There are three columns separated by tabs. The first column is a list of last names of 2002 St. Louis Cardinals position players (no pitchers). The second column lists the number of RBIs each of them earned that year and the final column is each of their salaries in millions of dollars. The format of this file looks kind of ugly because some of the names are too long for the tabbing to work out right, but Processing won't care about this! We are going to read this text file into Processing using loadStrings. Then we are going to make a plot out of it.

Cairo	23	0.85
Drew	56	3.6
Edmonds	83	8.33
Marrero	66	1.5
Martinez	75	7.5
Matheny	35	3.25
Palmeiro	31	0.7
Perez	26	0.5
Pujols	127	0.9
Renteria	83	6.5
Robinson	15	0.32
Rolen	110	7.625
Vina	54	5.33

Here's the program, and a screenshot of the plot I made.

//this data is the number of RBIs in 2002 for Cards position players 
//and their salaries (in millions of $)
//we will read the data in from a 3-column plain text file

String[] cards; //make the array and fill it with data later

void setup() {
   size(200, 200);
   background(255);
   PFont font1;
   font1 = loadFont("AbadiMT-CondensedLight-14.vlw");
   textFont(font1);
   smooth();
   cards = loadStrings("cards_data.txt"); //this is how we read in the file contents
   noLoop(); //just drawing a static plot once
}

void draw() {
   //make a grid for plotting. use translate to leave some blank space for labels
   translate(50, -50);
   stroke(200);
   for (int i = 0; i< 100; i=i+20) {
   line(i, 60, i, height); //vertical gridlines
   line(0, height-i, 140, height-i); //horizontal gridlines
}

//plot the data
stroke(0);
fill(75);
println("number of lines in data file is " +cards.length);

//go through the array called "cards" line by line
for (int i = 0; i<cards.length; i++) {

	//split each line where there is a tab
	//create a new array of strings called "data" to hold this info
	String[] data = split(cards[i], '\t');
	
	   String Name = (data[0]); //player name in first column
	   int Rbi = int(data[1]); //Rbi in the second column
	   float Salary = float(data[2]); //Salary in third column
	   
	   //make a scatter plot of Rbi v. salary
	   ellipse(Rbi, height-Salary*10, 10, 10); //want the axes origin at lower left, so do (height - y data)
   }
   
   //label the axes
   //I did these by trial-and-error until I got them to look right
   fill(0);
   text("RBIs", (width/2)-50, height+30);
   text("Salary $ mil", -50, 100, 30, 100);
   text("20", 15, height+15);
   text("60", 55, height+15);
   text("100", 95, height+15);
   text("2", -10, height-15);
   text("6", -10, height-55);
   text("10", -15, height-95);
}
screenshot of plot from code above
Screenshot of plot generated from code above.
cartoon ElizaThe main purpose of this plot is to show you how easy it would be to ditch your job and start making millions as an agent. Look how well RBIs correlate to salary! Albert Pujols was glaringly underpaid (making $900,000 and accumulating 127 RBIs) according to this metric but he negotiated a new contract with the Cardinals after the 2003 season that paid him upwards of $10 million per year, which puts him right in line with where the RBI prediction says he should be.

The secondary purpose of this plot is to demonstrate a few new commands and how to deal with an external data file. Inside setup() we used loadStrings to read the file into the program. You want to do all the reading-in of external files in setup() because that block just runs once and you don't want your cpu hogged by re-loading your files every time you run through draw(). The file will be loaded in as lines of String variables. The first thing we want to do is tell Processing that we actually want three columns, not 13 lines. So, we go through the data file line by line and split each line where there are tabs. The syntax '\t' tells Processing to look for a tab.

The fact that the data comes in as strings works out great for the player names because they are words. But if we want to do some arithmetic with the numbers, or otherwise treat them as numbers, then we have to convert them to other variable types. Inside the for loop where we run through the data file, we first split each line into three pieces, making a three-element array named data. Then we rename each element in the data array and convert it to another variable type if we want to. For example, we made an integer array out of the RBI data, and we made a float array out of the salary data. The chunk of code that does all that is here:

String[] data = split(cards[i], '\t');
String Name = (data[0]);
int Rbi = int(data[1]);
float Salary = float(data[2]);

Then we plot Rbi v. Salary.

ellipse(Rbi, height-Salary*10, 10, 10);

The rest of the program is devoted to doing the background work that spreadsheet and other canned plotting programs do for you. Here's where we make some gridlines:

//make a grid for plotting. use translate to leave some blank space for labels
   translate(50, -50);
   stroke(200);
   for (int i = 0; i< 100; i=i+20) {
   line(i, 60, i, height); //vertical gridlines
   line(0, height-i, 140, height-i); //horizontal gridlines
} 

Here's where we label the axes:

//label the axes
   //I did these by trial-and-error until I got them to look right
   fill(0);
   text("RBIs", (width/2)-50, height+30);
   text("Salary $ mil", -50, 100, 30, 100);
   text("20", 15, height+15);
   text("60", 55, height+15);
   text("100", 95, height+15);
   text("2", -10, height-15);
   text("6", -10, height-55);
   text("10", -15, height-95);

Of course this is a slightly more tedious way to make a simple plot -- you would probably rather just paste this little datafile into your favorite program and not spend time tinkering with the way the plotting grid looks, right? Sure, but the point is that you can do it this way and you have complete control over the way it looks, which is cool!!

1. A World Map

I got this data file from NOAA's coastline extractor. I'm not giving you a screenshot of the datafile this time because it is a 1.2 Mb file with over 62,000 lines. And that's the low-res version! Try pasting that one into Excel! However the program that makes this plot is quite simple:

//plotting a map of the world

String[] coast;

void setup() {
   size(600,300);
   coast = loadStrings("coastText.txt");
   noLoop();

}

void draw() {
   background(255);
   float[] coastLon = new float[coast.length];
   float[] coastLat = new float[coast.length];
   float[] newCoastLon = new float[coast.length];
   float[] newCoastLat = new float[coast.length];
   
   for (int i=0; i<coast.length; i++){
	  String[] data = split(coast[i], ' ');
	  coastLon[i] = float(data[0]);
	  coastLat[i] = float(data[1]);
   }
   
   for (int i=0; i<coastLon.length; i++){
	  newCoastLon[i] = map(coastLon[i],-180,180,0,width);
	  newCoastLat[i] = map(coostLat[i],-90,90,height,0);
   }
   
   stroke(50);
   for(int i=0; i<coastLon.length; i++){
   point(newCoastLon[i],newCoastLat[i]);

}
cartoon Eliza alerting you that instructions come nextWe read the file in, then because each line of the data file has two numbers, longitude and latitude, we split each line and populate two new arrays, one for longitude and one for latitude. In this data file there's just a blank space in between the numbers, not a tab, so that's why the second option in split has an empty space surrounded by single quotes. There's another for loop in which I use map to make the data plot in a way that exactly fills the display window. map takes 5 options. They are: the value itself, the original min and max range of that value, and then the min and max of the range you are changing it to. So for longitude, the "value" is just whatever the longitude in the data file is, the range of longitude is the whole Earth's longitude, so it's -180 to +180. Then the range we are plotting to is the window size, so between zero and the width of the window. map is great because it does the work for you of having to figure out the scale of things. Why would you want to spend time trying to calculate where 60 degrees east should go when map can do it for you?
map of the world coastline. display produced by program above.
Map of the world coastline. Display produced by program above.

I should point out here that it is just a coincidence that I used map to make an actual map. In fact, map is handy anytime you have a variable with a natural range to it but you want it to be expanded or contracted proportionally to a different range. For example, here is a program where map is used to expand the greyscale, which normally goes from 0 to 255, to a range that goes from 0 to 400, the width of the screen:

// demo use of "map"

float x;
float y;

void setup() {
   size(500,200);
}

void draw(){
   x=random(width);
   y=random(height);
   int a= int(x);
   color colr = int(map(a,0,width,0,255));
   fill(colr);
   ellipse(x,y,20,20);
}
map demo image
Screen capture of image generated from code above.

Quiz Yourself!

What does the snippet of code below do?

for (int i = 0; i < 5; i++) {
   line(35,map(i,0,5,height-1,1),50,map(i,0,5,height-1,1));
   text(i,25,map(i,0,5,height-1,1));
}
Click for answer...

If your answer was that this code will draw six horizontal lines and number them zero through 5, then you are right!

Further explanation: This is a for loop. The loop variable goes from zero to five by ones. Look at the first line inside the for loop. It is the command to draw a line. Drawing a line has four arguments and they are x1, y1, x2, y2. The x1 is always 35 and the x2 is always 50. the y1 and y2 look like a mess but they are the same as each other, so this code draws six horizontal lines.

The lines are evenly spaced between the top and bottom of the display window. That is what map does for us. We do not have to calculate where each line will be. We just map the values onto the range we want in the display window. I mapped the six lines from height-1 to 1 instead of height to 0 because lines plotted right on the border of the display window would not have shown up.

The second line of the for loop puts a text label next to each horizontal line. In fact it writes the value of i, which is a number. You can see that text is placed with its origin at the bottom left, so that's why the number 5 is cut off.

The lines are black and the text is white because we didn't set fill or stroke. Processing therefore uses the defaults:fill(255) and stroke(0).

display of discussed program