METEO 810
Weather and Climate Data Sets

Plotting Basics

Prioritize...

After finishing this section, you should be able to create a simple plot using various points and lines. Emphasis should be placed on learning how to create proper titles and axis labels.

Read...

Now that you know your way around data and dataframes, let's spend some time talking about how to present that data in meaningful ways. In this course, we won't get into any heavy statistical analyses; instead we'll focus on retrieving and presenting data in a way that can answer basic questions. Certainly, the first step in presenting data is simply a visualization in the form of a plot.

Let's start with an example using the data from the daily_obs data set:

Click the run button to see the plot produced by the script I provided. Note that the plot window is a bit cramped... you can hit the pop-up button to produce a bigger floating window.

I encourage you to switch over to R-Studio at this point and run the same script. Create a new script file in R-Studio and paste in the code from the DataCamp window. Make sure you swap out the read.csv statements for the data file you saved locally (daily_obs_may2016.csv). You'll notice that the lower right panel of R-Studio switches to the Plots tab and automatically displays the same plot. There’s not much to this basic plot. In fact, it's not very helpful, is it? Before we tackle some more complicated data sources and plot types, review some of the rules of good data communication.

In the last lesson, we learned the importance of crafting a message and using data to communicate or support that message. Indeed, it's true that a picture is worth a thousand words, but only if that picture is in focus and is properly composed. Remember that the process begins with choosing the correct display of the data. In addition, your graphs or charts should include the following elements...

  • Every graph should have a title which clarifies what is being displayed.
  • Axes should be labeled (with both a title and units where appropriate).
  • For graphs with multiple sets of symbols or lines, use both color and symbol/line type to distinguish each set (provide a legend as well).
  • An appropriate range for each axis such that data are clearly visible and are in no way misleading (check out this article on misleading graphs).

Let’s examine how each of these key elements can be added to a graph we just produced.

In your R-Studio script, replace the last two lines with the following code and run (source) it:

# Find the range for the independent and dependent variables
xrange<-range(daily_obs$DATE)
yrange<-range(daily_obs$TMAX,daily_obs$TMIN, na.rm=TRUE)

# Create the plot... but this time with no data
plot (xrange,yrange,type="n", 
      main="Maximum and Minimum Observed Temperature",
      sub="(May 2016, University Park, PA)",
      xlab="Date",
      ylab="Temperature (F)"
)

Check out the plot produced by the code above. Let’s look closer at the lines we added. First, there are the variables xrange and yrange that contain two numbers each that are the maximum and minimum values of any variables passed to the range(...) function. Now, if I wanted to plot a single variable (say the maximum temperature), I could just pass this variable to the plot function (the axes are set by the x and y variables you want to plot. But if I want to plot both max and min temperatures and I want the y-axis to accommodate both sets of values, I first must pass the variables to the range(...) function which calculates the total range for both variable. Then I pass that value to the plot(...) function. Another trick that I often find helpful is to create a blank plot first (that's what the type=”n” does in the plot command) and then add the data in a second step. Breaking a process into discrete steps gives you more control, especially in situations where you are plotting more than one set of results. Ultimately, how you go about producing plots is up to you. Please take a moment to peruse this article which reviews all of the various scatter-plot types available.

I also take the opportunity to add the plot title (“main”), subtitle (“sub”), the x-axis label (“xlab”) and the y-axis label (“ylab”) at this time. These commands can also be added separately (outside the plot command) and have a long list of formatting options. 

Now, let’s add some data to our graph. Add the following line to the end of your script and run it:

# Add a scatter plot of TMAX vs time with red, empty circles 
points(daily_obs$DATE,daily_obs$TMAX,pch=1,col="red")

Here we are plotting points (TMAX versus date). The “pch” property specifies the type of point (the value "1" is an non-filled circle) and col=”red” dictates the color. Now add:

# Add a line plot of TMAX vs time with a red, solid line
lines(daily_obs$DATE,daily_obs$TMAX,lwd=2,lty=1,col="red")

Here we are drawing a solid (lty=1) red line of width (lwd) of 2 for the same data (graph). If you don’t like the fact that the line over plots the points you can change the line to:

lines(daily_obs$DATE,daily_obs$TMAX,lwd=2,lty=1,col="red",type="c")

Or you can change the points to be solid. Try changing the “points” command to:

points(daily_obs$DATE,daily_obs$TMAX,pch=19,col="red")

Now, add another series of points, lines, or both to your graph for the daily_obs$TMIN series (use: pch=17, lty=3, col= "blue"). Here is a great reference if you want to explore the different types of points and lines available

Finally, let’s add a legend to our graph.

# Draw a legend
legend ("topleft", inset=0.03, 
         c("Max T","Min T"), 
         col=c("red","blue"), 
         pch=c(19,17), 
         lty=c(1,3),
         lwd=2 )

These are just a few of the options you can use to customize a legend. The “topleft” and “inset” properties position the legend, then follow the labels, colors, plot symbols, line types, and line thicknesses. You can use this code block as a template, or you can play around to find one that you like. Remember, if you want help on a particular command, you can always type: help(<command>) or ?<command> in the console window. For example, type: help(legend) in the console window to see information on how to create a legend. In R-Studio, the help dialog tab opens with information on your requested topic. Here's my final graph as exported from R-Studio. Make sure that you can duplicate a graph like this in R-Studio before moving on to the next section.

You may be thinking, "Wow, that's a lot of work to make a simply scatter/line plot... Why not just push a few buttons in Excel?" You're right to some degree... making simple plots may in fact be easier in a package such as Excel. However, don't rush to judgment just yet. R's power and flexibility will become evident as we progress; soon our data needs will far exceed Excel's capabilities. In the next section, we'll look at more possibilities for displaying data.

Read on.