METEO 810
Weather and Climate Data Sets

Plotting NARR data

Prioritize...

After completing this section, you will have successfully installed the "ncdf4" library in R and retrieved data from a NetCDF file.

Read...

At this point, you have installed the NetCDF libraries, the ncdf4 R-library, and you have a data file. Let's crack it open... (at this point, you should be familiar with what we are doing in R). Start a new R-script file, enter the code below, and save it in the same directory as your data file (for simplicity's sake). By the way, don't forget to set your working directory to that of the source file.

# if for some reason we need to reinstall the
# package, then do so.

if (!require("ncdf4")) {
  install.packages("ncdf4") }

# include the NetCDF library
library(ncdf4)

# We'll also need these libraries for plotting maps of data
library(fields)
library(maps)

# open the NetCDF file
nc <- nc_open("air.sfc.1988.nc")

Source this code and then type "nc" at the R command prompt. You should see a description of the file contents. Notice that it contains 4 variables (3 actually, if we ignore the map projection information) and 3 dimensions (x, y, and time). You can also see that latitude and longitude are a function of x and y, while the temperature variable ("air") is a function of space (x and y) as well as time. We can extract the total size of the "air" variable by typing the command: nc$var$air$varsize at the command prompt (notice that the variable information is stored in a complex data frame called nc$var). The values for each dimension can be obtained by querying the $dims data frame: nc$dim$time$vals. If you do so, you'll notice that the times are in some strange integer format. If we go back to the file definitions (typing nc or specifically nc$dim$time$units) we see that the time base is "hours since 1800-01-01 00:00:00". Therefore, we need to do a bit of manipulation to have times that make some sense. Fortunately, R comes to the rescue...

# First get the time data in hours
ncdates = nc$dim$time$vals  #hours since 1800-1-1

# Next set a date/time object at the start of the time base  
origin = as.POSIXct('1800-01-01 00:00:00', tz = 'UTC')

# Now convert the time string to seconds and add it to the origin.
# ncdates should be a series of dates and times which matches the
# file you chose.
ncdates = origin + ncdates*(60*60)

# Let's pick a date/time we want to plot
# This should actually go at the top of your code
target_datetime<-"1988-05-04 12:00 UTC"

# Find the index of the time entry that matches the target
time_index=which(ncdates==as.POSIXct(target_datetime, tz='UTC'))

Now that we have time sorted out, what about the latitude and longitude? If you haven't already, notice that NetCDF files are not like serialized text files. By that, I mean that there are no entries like: "lat1, long1, temp1". That would be an unnecessary repetition of the variables "lat" and "lon" for each grid of temperature. Instead, latitude and longitude are defined by their own grid which is equal in size to the temperature grids. You might also have noted that latitude and longitude are variables, not dimensions. The dimensions "x" and "y" define the data space and latitude/longitude are mapped onto each (x,y) pair. This is because latitude and longitude are not equally spaced, due to the map projection (Lambert conformal). So, if we are going to plot a grid of temperature versus (latitude/longitude), we need to compute arrays for those variables as well.

# How big is the dataset?
# Note: the "$air" portion must match your variable name
data_dims<-nc$var$air$varsize

#Get the lats/lons using ncvar_get
lats=ncvar_get(nc, varid = 'lat', 
               start = c(1,1),
               count = c(data_dims[1],data_dims[2]))

lons=ncvar_get(nc, varid = 'lon', 
               start = c(1,1),
               count = c(data_dims[1],data_dims[2]))

# The longitude values are not compatible with my map
# plotting function, so I need to fix this.
lons<-ifelse(lons>0,lons-360,lons)

Let's look at the function ncvar_get(). Notice that for the retrieval of lat/lon we need to know how many values to get in each dimension. I could hard-code this, but why not just query the file to get the dimensions (that's the advantage of the self-describing nature of NetCDF). The "get" function takes a pointer to the file, the variable ID, a starting point and a counter in each dimension. Since we want the entire grid, we'll start at (1,1) and retrieve the max value for each dimension. Now, let's get the temperature data...

# Because we are retrieving a multi-dimensional array, we must  
# set aside some space beforehand.
temp_out = array(data = NA, dim = c(data_dims[1],data_dims[2],1))

# get the temperature data (notice the start and count variables)
temp_out[,,] = ncvar_get(nc, varid = 'air', 
                       start = c(1,1,time_index),
                       count = c(data_dims[1],data_dims[2],1))

# convert the temperature grid to Fahrenheit 
temp_out[,,1]<-(temp_out[,,1]-273.15)*9/5+32

Finally, we're ready to plot the data. We'll use the image.plot() function and then overlay some maps.

# set the color map and plot size
colormap<-rev(rainbow(100))
par(pin=c(7,4.5)) 

# plot the data
image.plot(lons, lats, temp_out[,,1], col = colormap,
           xlab = "Longitude", ylab = "Latitude",
           main = paste("NARR Surface Temperature (F): ",target_datetime))

# plot some maps
map('state', fill = FALSE, col = "black", add=TRUE)
map('world', fill = FALSE, col = "black", add=TRUE)

If you want just a close-up of the US, you can modify the plot call like this...

image.plot(lons, lats, temp_out[,,1], col = colormap,
           xlim=c(-130,-60), ylim=c(25,55),
           xlab = "Longitude", ylab = "Latitude",
           main = paste("NARR Surface Temperature (F): ",target_datetime))

Here are the two surface temperature plots: Entire domain; Contiguous U.S. only.

Finally, before we move on, I want to give you one more useful command from the ncdf4 library. Just as we opened the NetCDF file with nc<-nc_open(...), we can also release the file from memory using nc_close(nc). Strictly speaking, nc_close is only absolutely necessary when creating/writing a NetCDF file because this flushes any unwritten data stored in memory to disk. Still, it's always good to manually close the connection when you are finished with the file, even if you are only reading from it. One caveat that you should be aware of, however... Note that if you put this command at the end of your file, once you source it, you won't be able to query the pointer to the file ("nc" in our case) manually via the command line. Once you sever the connection to the file, the pointer is destroyed.

Now that you have the hang of NetCDF files, let's see how else we might parse the data. Read on!