METEO 810
Weather and Climate Data Sets

Lesson 6 Activity

Assignment Objective...

You will use the knowledge and scripts you have gained from this lesson to assess feasibility of using reanalysis climatology in data-sparse regions.

Let's get started.

Setup...

In this lesson, we practiced retrieving data from NetCDF and GRIB files by looking at the North American Regional Reanalysis data set. One might argue the need for such a data set as a historical reference over observation-rich regions. However, over the oceans, NARR data may be the only source of climatological data available. But how accurate are the data? In this assignment, we will examine a host of observations from ocean buoys and then compare those observations with NARR data. Our goal is not necessarily to make fine-scale comparisons, but rather determine on what scales the NARR data are an acceptable proxy for climatologies over the oceans. 

Deliverables...

Please use R to answer the questions posed below. In some cases, you will need to paste the output from R, and in other cases, you will need to provide a written analysis along with a graphic or two.

Questions...

  • Use the ISD map-generating code located in Lesson 3 to locate a buoy or offshore observing platform off the coast of the United States (in the Atlantic, Pacific, or Gulf of Mexico). Note: The further offshore the better (greater than 100 miles) and choose a station that has been in operation for at least 10 years. Do some research to find out what you can about this observing platform. You might want to visit the National Data Buoy Center to see if you can locate your station and determine what type of instrumentation that it provides. Write up a brief description of what you discover. Is this observing platform a reliable "ground truth" for the comparison that we will be making?
  • Retrieve the ISD data from your offshore station for a recently completed year. You're going to want air temperature, sea-surface temperature, wind speed and wind direction. Please retrieve all of the quality flags too, so that you can throw out everything that is flagged as suspicious. Provide 1) a graph of air temperature and sea-surface temperature for the year and 2) a wind rose for the year.
    Here's the algorithm I used to retrieve/format the ISD data...
    # check to see if rnoaa has been installed
    # add the library and NOAA key
    
    # Get the ISD data 
    
    # Grab the following columns
    # Note: your station might not have all of these
    isd_trimmed <- isd_data[c("date","time",
                   "wind_direction","wind_direction_quality",
                   "wind_speed", "wind_code", "wind_speed_quality",
                   "temperature","temperature_quality",
                   "SA1_temp","SA1_quality")]
    
    # Throw out all values where the quality flag is not "1"
    # You might find that "9" is acceptable for SA1_temp
    # You can check your data first with: table(isd_trimmed$temperature_quality)
    # This shows you how many of each temperature quality flags you have
    isd_trimmed$temperature[isd_trimmed$temperature_quality!=1]<-NA
    isd_trimmed$SA1_temp[isd_trimmed$SA1_quality!=9]<-NA
    
    # Fix the wind values so that calm winds have speed=0
    # and direction=NA
    
    # Fix the date/time code
    
    # Change the coded values to actual values
    
    # make a new data frame for the windRose plot (remember it's picky)
    
    # Put your plotting code here 
    
    
  • Now, retrieve the corresponding NARR data for your chosen year and the lat/lon of your station. You want the highest temporal resolution available (the 3 hourly files). You want the surface temperature, 2-m air temperature and the U- and V-winds at 10m. Models compute the N/S (v-wind) and E/W (u-wind) components of the wind rather than speed and direction. Once you modify and run the two scripts below, generate a temperature plot and windrose similar to the ones above. What do you observe?
    Algorithm for the NARR temperature retrievals...
    # load ncdf4 if we need to
    # include the library
    
    # open the two temperature CDF files
    
    # set the location of our station
    
    # determine the dimensions of the data set?
    
    #Get the properly formatted date strings
    
    # obtain the lat/lon grids
    
    # find the grid index of closest point
    
    # create the temperature arrays... Notice the different dim parameter
    temp2m_out = array(data = NA, dim = c(1,1,data_dims[3]))
    tempsfc_out = array(data = NA, dim = c(1,1,data_dims[3]))
    
    # load the 2m temperature array and convert it to C
    
    # load the sfc temperature array and convert it to C
    
    # Make a new dataframe to hold the data
    
    # Put any plotting codes here
    
    
    Algorithm for the NARR Wind retrieval...
    # load ncdf4 if we need to 
    # include the library
    
    # open the two wind CDF files
    
    # set the location of our station 
    
    # determine the dimensions of the data set?
    
    #Get the properly formatted date strings
    
    # obtain the lat/lon grids
    
    # Find the grid index of closest point
    
    # Create the wind arrays
    
    # load the u-wind array
    uwind_out[,,] = ncvar_get(nc1, varid = 'uwnd',
                   start = c(closest_point[1],closest_point[2],1),
                   count = c(1,1,data_dims[3]))
    
    # get the v-wind array
    vwind_out[,,] = ncvar_get(nc2, varid = 'vwnd',
                   start = c(closest_point[1],closest_point[2],1),
                   count = c(1,1,data_dims[3]))
    
    # Calculate speed and direction from U and V components
    NARR_windsp<-sqrt(uwind_out[1,1,]^2+vwind_out[1,1,]^2)
    NARR_winddir<-(270-(atan2(vwind_out[1,1,],uwind_out[1,1,])*(180/pi)))%%360
    
    # Create a new dataframe with the final data
    NARR_wind_data<-data.frame(ncdates,NARR_winddir,NARR_windsp)
    colnames(NARR_wind_data)<- c("timecode","wd","ws")
    
    # Put the windRose(...) plotting code here
    
    
  • Now, you are going to be asked to argue whether the NARR data over this part of the ocean might be a good proxy for in situ observations (and on what time scales those proxies are suitable). To do this, you are going to need to compare the two sets of of data. You might have noticed that the ISD data are every hour and the NARR data are every 3 hours, and in my case, there are some missing observations in the ISD data. So, if you wanted to do a direct comparison, you will need to use only the dates/times common to both files. Fortunately, R has a function that's called merge(...) that does exactly this. I have provided a few more lines of code below that will help you merge your data together. Show a scatter plot of all_data$T2m vs all_data$temperature, or all_data$SST vs all_data$SA1_temp. What do these graphs tell you about the comparability of the model data versus the observations? Plot the error between observation and model. Is a histogram a good way to do this? How about the error as a function of model temperature (maybe a box plot)?
    Merge code...
    > # First let's merge all of the NARR data together
    > all_data<-merge(NARR_temp_data, NARR_wind_data, by="timecode", all=FALSE)
    
    > # Now merge in the ISD data
    > all_data<-merge(all_data,isd_trimmed, by="timecode", all=FALSE)
    
    
  • How do we change the time scales of the observations? Again, R comes to the rescue with the aggregate(...) function. Use the code below as a template to compare the daily means for air temperature and sea-surface temperature. What do you observe? Is one measurement more in agreement than the other? If so, speculate on why that might be.
    Aggregate code...
    # sample aggregate function.
    daily_temp_means<-aggregate(cbind(T2m,temperature)~format(all_data$timecode, "%m-%d"), data=all_data, FUN=mean, na.rm=TRUE)
    
    # Note how this function works...
    # cbind(... lists the columns to compute
    # ~format(... gets a list of month-date (this is the "aggregate by" variable)
    # data=... this is the data frame to aggregate
    # FUN=... the function to apply (you can also try "max" or "min")
    # na.rm... removes any NA's from the aggregating process
    
    
  • Modify the aggregate code to create a daily_windsp_means dataset. Make a scatter plot comparing model and observed daily mean wind speeds and comment about what you observe. Can you think of a reason for the difference (hint: consider where the model is computing the wind speed versus where it is observed). How might you correct for this difference?
  • Extra Credit: Note that I didn't have you compute means of wind direction (in the format it is now). Why is that? What errors might arise? How might we obtain accurate means of wind direction. Write a snippet of R code to do this given just wind speed and direction.

The Fine Print...

  • Please submit your report in either MSWord or PDF format.
  • Saving images: R-Studio makes saving graphs a piece of cake. Notice that at the top of the plot window there is an "Export" pull-down where "Save as Image" is an option. When you choose "Save as image", you are given a dialog box where you can set the type (PNG is best), location, and filename of your image. I would set aside a folder for this assignment where you can save your images and then paste them into your word-processing document.
  • Proofreading is a must. It shows professionalism and care for your work. This includes not only basic spelling and grammar, but readability as well. If you struggle with proofreading, have someone give your draft a once-over and provide suggestions.
  • Grading: I want to foster an environment where you can focus more on what you are learning than what "grade" you are getting. However, that said, I have to give some sort of grade and feedback for each assignment. Please see my assignment grading rubric in the Orientation section for general guidelines on assignment grades.

Submit Your Assignment

Please follow the instructions listed on the submission page in Canvas.