METEO 810
Weather and Climate Data Sets

An Introduction to rNOMADS

Prioritize...

After completing this section, you should be able to query the NOMADS model database using the R-library "rNOMADS". You should also be familiar with the types of parameters that must be defined in order to correctly query model data.

Read...

The suite of numerical models run by the United States is coordinated through the National Centers for Environmental Prediction (NCEP). More specifically, you can access numerical model data using the NOAA Operational Model Archive and Distribution System (or NOMADS). NOMADS provides several means for exploring and retrieving a vast volume of numerical prediction data. As with other data sets that we have examined, the sheer quantity of data (along with arcane file types) make routine data retrieval somewhat of a chore.

Enter rNOMADS. This R-library will help you to select and retrieve model data for a variety of uses. You might want to refer to the rNOMADS documentation, but, as always, you can just dive in using my example code.

Let's start with the basics...

# check to see if the libraries have been installed
# if not, install them
if (!require("rNOMADS")) {
  install.packages("rNOMADS") }
if (!require("maps")) {
  install.packages("maps") }

library(rNOMADS)
library(maps)
library(fields)

#get a list of available models (abrev, name, url)
model.list<- NOMADSRealTimeList("dods")

From the documentation, there appear to be two different methods for accessing data: GRIB access (a WMO file standard) and the DODS server. While GRIB retrieval might be faster, I find that retrieving files from the DODS server is more intuitive. You can print out model.list to see all the model types available. It should mirror the list given on the NOMADS page. By the way, we'll examine the GRIB file format in a later lesson.

Once you choose a model (in this case "gfs_0p50", the 0.5-degree resolution GFS), you can query the available model dates that are contained in URLs pointing to the data server. In the code below, I retrieve the most current date, retrieve the run times for that date, and finally choose the most recent runtime. Finally, I use GetDODSModelRunInfo to query the server for specifics about the model I have requested. From this query, I can learn about the extent of the 3-dimentional model domain, resolution, and variables computed.

#get the available dates and urls for this model (14 day archive)
model.urls <- GetDODSDates("gfs_0p50")

latest.model <- tail(model.urls$url, 1)

# Get the model runs for a date
model.runs <- GetDODSModelRuns(latest.model)

latest.model.run <- tail(model.runs$model.run, 1)

# Get info for a particular model run
model.run.info <- GetDODSModelRunInfo(latest.model, latest.model.run)

# Type model.run.info at the command prompt 
#    to see the information

# OR You can search the file for specific model 
#   variables such as temperature: 
#   model.run.info[grep("temp", model.run.info)]

Note that you can simply print out model.run.info, or you can search the output using the command "grep" to look for certain phrases, such as those containing the word "temp" (often short for temperature). Granted, the output of model.run.info can be daunting (there are over 41 variables that contain the word temperature). However, rest assured that you don't need to have an understanding of them all. Just know that numerical models keep track of temperature at many levels. At the moment, let's just focus on the temperature at 2 meters above the ground (variable: "tmp2m").

The next portion of the code sets the parameters to retrieve and then calls the retrieval routine DODSGrab(...). Note that we can request vectors of forecast time, lat/lon, and pressure level (if applicable). The numbers for each vector represent different quantities. For example, the time vector the numbers represent each forecast hour with 0 being the analysis run. So, if the model generates forecasts every 6 hours and we wanted only the 24-hour and 30-hour forecasts, how would we structure the vector?  (answer: time <- c(4,5)) Likewise, the lat/lon vectors are based on the grid points of the 0.5x0.5 degree grid. Remember that we can get the precise dimensions of the grid by looking at the output of GetDODSModelRunInfo(...). In this case, grid point (0,0) represents a lat/lon of -90N 0E. Can you figure out what the grid point of 45N 80W is and construct the proper vector calls? (answer lon <- c(560,560) and lat <-c(270,270))

variable <- "tmp2m"
time <- c(0, 0) # Analysis run, index starts at 0
lon <- c(0, 719) # All 720 longitude points (it's total_points -1)
lat <- c(0, 360) # All 361 latitude points
lev <- NULL      # DO NOT include level if variable is non-level type

print("getting data...")
model.data <- DODSGrab(latest.model, latest.model.run,
                       variable, time, levels=lev, lon, lat)

You can imagine that this host of parameters means that we can extract a variety of time series, profiles, and cross-sections through the model data. We'll discuss some of these approaches later in the lesson. But, for now, let's look at how we might display the data. We are going to be using the maps and fields library to display the data along with various maps. To further make sense of the model output, we must execute the following steps... 1) switch the lat/lon coordinate system around so that east longitudes are positive and west longitudes are negative; 2) re-grid the model data (more on this in a minute); 3) swap out-of-bounds (very large numbers) values with the "NA" value; and 4) change the model temperature (in Kelvin) to Fahrenheit.

# reorder the lat/lon so that the maps work
model.data$lon<-ifelse(model.data$lon>180,model.data$lon-360,model.data$lon)

# regrid the data
model.grid <- ModelGrid(model.data, c(0.5, 0.5), model.domain = c(-100,-50,60,10) )

# replace out-of-bounds values
model.grid$z <- ifelse (model.grid$z>99999,NA,model.grid$z)

#convert data to Fahrenheit
model.grid$z[1,1,,]=(model.grid$z[1,1,,]-273.15)*9/5+32

Let's look closer at the ModelGrid function that is included in the rNOMADS library. This function distills the output of DODSGrab and formats it into a grid format suitable for plotting. The second parameter is the output resolution of the grid (less than or equal to the model resolution). The third parameter is the type of grid ("latlon" in this case), and the fourth parameter is the sub-sampled domain. This function allows you to perform a variety of subsections of the model data only using one retrieval call from the NOMADS server.

Finally, we can plot the data using a modified image.plot function and several map functions. If you don't alter the code, your data should resemble this GFS analysis (initial data) of 2-meter temperatures for the eastern half of the United States.

# specify our color palette
colormap<-colormap<-rev(rainbow(100))

# make the graph square 4"x4"
# Note: Remember if you get an error that says, "plot too large"
# either expand your plot window, or reduce the "pin" values
par(pin=c(4,4))

# image.plot is a function from the fields library
# It automatically adds a legend
image.plot(model.grid$x, sort(model.grid$y), model.grid$z[1,1,,], col = colormap,
           xlab = "Longitude", ylab = "Latitude",
           main = paste("Surface Temperature",
           model.grid$fcst.date))

map('state', fill = FALSE, col = "black", add=TRUE)
map('world', fill = FALSE, col = "black", add=TRUE)

You might also try looking at 2-meter dew point temperatures. If you do, you could change the color palette line to: colormap<-colorRampPalette(c(rgb(0,0,0,1), rgb(0,1,0,1)))(100)

Next, let's look at a few other ways of displaying model data. Read on.

Explore Further...

It can be tedious to copy the appropriate model information from model.run.info into other parts of the code. Therefore, I wrote a function that automatically extracts information such as the latitude/longitude of the model domain and the resolution in each dimension. You are welcome to add it to your code if you like. The function should be defined above where you call it. I put mine right after the "library" statements.

#function that extracts pertinent model information from model.run.info 
extract_info <- function(x) {
  lonW<-as.numeric(as.character(sub("^.*: (-?\\d+\\.\\d+).*$","\\1", x[3])))
  lonE<-as.numeric(as.character(sub("^.*to (-?\\d+\\.\\d+).*$","\\1",x[3])))
  lonPoints<-as.numeric(as.character(sub("^.*\\((\\d+) points.*$","\\1", x[3])))
  lonRes<-as.numeric(as.character(sub("^.*res\\. (\\d+\\.\\d+).*$","\\1", x[3] )))
  latS<-as.numeric(as.character(sub("^.*: (-?\\d+\\.\\d+).*$","\\1", x[4])))
  latN<-as.numeric(as.character(sub("^.*to (-?\\d+\\.\\d+).*$","\\1",x[4])))
  latPoints<-as.numeric(as.character(sub("^.*\\((\\d+) points.*$","\\1", x[4])))
  latRes<-as.numeric(as.character(sub("^.*res\\. (\\d+\\.\\d+).*$","\\1", x[4] )))
  out<-data.frame(lonW,lonE,lonPoints,lonRes,latN,latS,latPoints,latRes)
  return (out)
}

#.... later on in the code, after you retrieve model.run.info ....

# your function can be called like this...
model.params<-extract_info(model.run.info)

# and used like this...
variable <- "tmp2m"
time <- c(0, 0) #Analysis run, index starts at 0
lon <- c(0, model.params$lonPoints-1) 
lat <- c(0, model.params$latPoints-1) 
lev<-NULL  # DO NOT include level if variable is non-level type

# and this...
model.grid <- ModelGrid(model.data, c(model.params$latRes, model.params$lonRes), "latlon", model.domain = c(-100,-50,60,10) )