You will use the skills you have learned in R to dive into some large, weather data sets. The goal of this assignment is to create the required graphics to answer the various posed questions. Note: We'll also include some of the concepts about communicating your message from Lesson 2.
Let's get started.
Here is the file containing the weather data that you will need [1]. It is a comma-delimited file containing temperature and precipitation data at O'Hare International Airport from 1 January 1976 to 31 December 2016 (40 years). Assume for the moment that the data is accurate and that no changes to the site have occurred that would affect the record. We'll learn in the next lesson how to make such determinations, but let's not complicate things at this point. WARNING! Editing this file in Excel will change the date formats (just be aware of that).
Please use R to create graphics that answer the following questions regarding the data. You can build on the pieces of code that I provided in the lesson, but know that I have intentionally left a few challenges in your way that you are going to have to overcome (as does working with any new data set). You are welcome to convey the answers to the questions in any manner you wish, however, you are expected to have a clear resultant message for each problem you are asked to examine. (Correct graphics, correctly formatted, are a critical requirement.)
Each question should contain the following components:
Question and Answer Summary – Start each section of your report with the question you are trying to answer and your key message (no more than a paragraph) of what the data tell you. All of the questions here are simply asking you to present what the data say, without any deeper meteorological analysis or editorial. This, of course, is not what would happen in the real world, but visualizing the data is a key first step in any analysis procedure. If you are a statistics whiz, please refrain from providing calculation more complex than an "average" or "sum".
Graphic(s) – You may provide no more than two graphics for each question. Your client is busy and doesn't want to be bored to death with a stream of endless graphs. Please make sure that each graph is appropriate for your message and contains all of the things that good graphs should have (titles, labels, legends, etc.). You are not limited to only the look and feel of the graphs that I have presented in this lesson. Indeed, you may (and are encouraged to) experiment with different looks and ways of presenting the information as well. However, don't worry if such exploration is beyond your coding abilities. Most often, a simple approach is best to convey the proper message.
Reasoning – Provide the reasoning as to why you chose to present the data in a certain way. Why did you use a line graph instead of a bar graph? Why did you choose a histogram instead of a scatter plot? This section is key. It is meant to show me how you thought about your approach. You would never get into this much detail with a client, but you might have to argue such reasoning to members of your team.
R-Code – Finally, provide the code that produced the graphic(s) in each question. As I have stated earlier, I am not grading your code, per se. However, I would like to be able to provide feedback on your approach. Please make sure that your code is nicely formatted and has enough comments so that I can understand what you did.
read.csv(...)
to get them read in. Please use this library along with "openair" to discuss this question. For this question alone, I am suspending my 2-graphic limit (you may need a few here, but don't go crazy)!I know that this activity might be a bit rough for folks who are new to coding. So, I thought that I would use this page to provide some tips and tricks to help you get started. I have listed them according to each question that you are trying to answer. As more questions come in, I hope to add to this page.
# Format a column in dataframe "mydata" to be dates # You have to tell R the format of the dates in the file mydata$Date<-strptime(mydata$Date, format="%Y-%m-%d") #but if you edited the file in Excel, you might need: mydata$Date<-strptime(mydata$Date, format="%m/%d/%y")
# Get a year's worth of dates dates1976<-mydata$Date[(mydata$Date >= "1976-01-01" & mydata$Date < "1977-01-01" )] # Get a year's worth of Max Temperature Normals maxtemps1976<-mydata$MaxTemperatureNormal[(mydata$Date >= "1976-01-01" & mydata$Date < "1977-01-01" )] #etcThen, make your plots using these new variables.
cumsum(...)
to perform this task.JanmeanTempdiffs<-mydata$AvgTemperature[(strftime(mydata$Date,"%m"))=="01"]-mydata$AvgTemperatureNormal[(strftime(mydata$Date,"%m"))=="01"]
JanmeanTempdiffs<-mydata$AvgTemperature[(strftime(mydata$Date,"%m"))=="01"]-mydata$AvgTemperatureNormal[(strftime(mydata$Date,"%m"))=="01"] Jan_dates<-mydata$Date[which(strftime(mydata$Date,"%m")=="01")]Now, to average these by year, we need to use the
aggregate(...)
function. This function takes one column of data and performs a function on it using grouping (specified in the "by
" column). In the case below, I create a list of years by which to do the averaging.
# use the aggregate function to get a mean January # departure by year. meanTempdiffs<-aggregate(JanmeanTempdiffs, by=list(strftime(Jan_dates,"%Y")), mean, na.rm=TRUE) colnames(meanTempdiffs) <- c("Year", "MeanDiff")Finally, we can sort the result by coldest January.
coldest_Jans<-monthyTempdiff[order(monthyTempdiff$MeanDiff),]
skip=2
to read.csv(...)
. This will skip the first two header rows of the file (you can rename the columns you want to keep). I also suggest using the parameter, colClasses = "character"
, so that you can do the data conversions yourself (better this way).# read the data # keep the date, time, direction, and speed columns # create a single date column. Here's an example # I labeled my unformatted date/time accordingly wind_data$date<-as.POSIXct(strptime(paste(wind_data$date_uf,wind_data$time_uf), format = "%Y%m%d %H%M", tz="UTC")) # convert wd and ws columns to numeric and set 999 to NA # plot the wind rose
Links
[1] https://www.e-education.psu.edu/meteo810/sites/www.e-education.psu.edu.meteo810/files/Images/Lesson2/oharedata_19760101-20161231.csv
[2] https://www.weather.gov/lub/events-2011-2011-drought
[3] https://en.wikipedia.org/wiki/1995_Chicago_heat_wave
[4] https://www.e-education.psu.edu/meteo810/sites/www.e-education.psu.edu.meteo810/files/Images/Lesson2/ohare_wind.csv
[5] https://www.e-education.psu.edu/meteo810/sites/www.e-education.psu.edu.meteo810/files/Images/Lesson2/kci_wind.csv
[6] https://www.ncdc.noaa.gov/isd