METEO 810
Weather and Climate Data Sets

Automating Retrieval

Prioritize...

In this section, we will again use R's ability to loop over a list of data. Such loops are important in automating tasks such as retrieving data from a list of model runs.

Read...

The NOMADS database archives as many as 50 past model runs. For example, at any one time, you are able to retrieve any GFS model output for the past 14 days. This means that we can do some fairly interesting data retrievals including looking at the initialization state of a model variable over time and studying how a predicted variable changes in time (sometimes referred to as "D-model, D-T"). The problem arises in that each model run is stored separately in the database. Unlike retrieving all of a single model's forecasts in a single call, we now need to retrieve all of the models' single forecasts. This will require some automation on R's part. Let's start a new script.

First, we start with a modified rNOMADS script...

# check to see if rNOMADS has been installed
# if not, install it
if (!require("rNOMADS")) {
  install.packages("rNOMADS") }

library(rNOMADS)

#get the available dates and urls for this model (14 day archive)
model.urls <- GetDODSDates("gfs_0p50")
model.url.list <- model.urls$url

# Here's the point I want
my_lat <- 40.793313
my_lon <- -77.866796

# Define our retrieval parameters
variable <- "tmp2m"
time <- c(0, 0)    # Analysis run, index starts at 0
lon <- c(560, 580) # Just a rough estimate of
lat <- c(260, 270) # where the point is located
lev<-NULL  # DO NOT include level if variable is non-level type 

# This will be our placeholder for the specific point data we want
model.timeseries <- data.frame()

Notice that in this script, we do not worry about the latest model date, nor do we care what the latest model run is. The reason is that we are going to retrieve them all and build a time series of the result. Therefore, we need to define two functions. The first takes specific model runs on a specific day, retrieves the partial grid of data, and extracts the value from the grid point nearest my location. This result is then added as a row to the model.timeseries data frame.

retrieve_data <- function (run, model) {
  
     # get the partial grid
     model.data <- DODSGrab(model, run, variable, time, levels=lev, lon, lat)
  
     # reorder the lat/lon 
     model.data$lon <- ifelse(model.data$lon>180,model.data$lon-360,model.data$lon)
  
     # select the closest point
     model.value <- model.data$value[which(abs(my_lon-model.data$lon)<0.5/2 & abs(my_lat-model.data$lat)<0.5/2)]
     model.fdate <- model.data$forecast.date[which(abs(my_lon-model.data$lon)<0.5/2 & abs(my_lat-model.data$lat)<0.5/2)]
  
     #convert to degrees F
     model.fvalue <- (model.value-273.15)*9/5+32

     # add the row to the data frame
     # remember that we have to use the <<- operator 
     # to pass the value outside the function
     row <- data.frame(Date=model.fdate, Value=model.fvalue)
     model.timeseries <<- rbind(model.timeseries,row)
}

Now, we need a function that will loop over all of the daily runs for a given model. Notice that in this function we use sapply which says, "take all of the data in model.runs$model.run and apply it to the function retrieve_data. The parameter "model" is an extra variable that gets passed to the retrieve_data function as well. If you think in terms of programming "loops", this is the inner loop.

loop_over_runs <- function (y) {
     # Get the model runs for a date
     model.runs <- GetDODSModelRuns(y)

     # run the loop_over_runs function for each time in model.runs
     sapply(model.runs$model.run, retrieve_data, model=y)
}

Now that we have our two functions, we can add the rest of the code. It's relatively simple. First, there's the primary sapply call that sends the list of models to the function loop_over_runs. This command is the outer loop of the retrieval process. Next, there is just the standard plotting routine that we have been using in the last few tutorials.

# for each model in model.url.list, loop over all of the runs
sapply(model.url.list, loop_over_runs)

# Display times in GMT
attr(model.timeseries$Date, "tzone") <- "GMT"

# Make a plot
plot(model.timeseries$Date,model.timeseries$Value, type="n",
     xlab = "Date/Time (GMT)", ylab = "Temperature (F)",
     main = paste0("GFS Initialized 2-m Temperature\n",
         format(head(model.timeseries$Date,1), "%m-%d-%Y %HZ")," to ",
         format(head(model.timeseries$Date,1), "%m-%d-%Y %HZ"))
)
lines(model.timeseries$Date,model.timeseries$Value,lwd=2,lty=1,col="red")

Voila! Here is the plot I generated. Can you think of other interesting things we might want to look at using this approach? What about how the temperature forecast for a certain time is changing as the forecast interval decreases? What would such a plot tell you about the stability of that temperature forecast?