Activity: Statistical Analysis of Climate Data
NOTE: For this assignment, you will need to record your work on a word processing document. Your work must be submitted in Word (.doc or .docx), or PDF (.pdf) format.
For this activity, you will use the application below to perform basic statistical analyses of climate data. The data we will use are global temperature anomalies and Niño 3.4 index, both measured in °C. You need to:
- Determine historical trends in global temperatures and determine if there has been an increase in the trend.
- Analyze the influence of El Niño on global temperatures.
- First, save the Problem Set #1 Worksheet to your computer. You will use this word processing document to electronically record your work in the remaining steps.
- Save the worksheet to your computer by right-clicking on the link above and selecting "Save link as..."
- The worksheet is in Microsoft Word format. You can use either Word or Google Docs (free) to work on this assignment. You will submit your worksheet at the end of the activity, so it must be in Word (.doc or .docx) or PDF (.pdf) format so the instructor can open it.
- Please show your work! When you are explicitly asked to create plots in a question, please cut-and-paste graphics and the output from the screen (e.g., by first printing the output to a pdf file and then directly inserting into the worksheet) to submit along with your discussion and conclusions.
- Use the plotting tool to create a line plot of global temperature anomalies vs. time over the full 160 year period (1850 - 2009). Determine the basic statistics of the time series, i.e., the mean and standard deviation. If the mean is not zero, can you guess the 30 year base period that was used to calculate the anomalies? To do that, note that by definition, the mean of the base period used to calculate anomalies should be zero. Visually determine the year where the temperature anomalies graph crosses zero. Zoom in on different 30-year intervals around that year and use Viewport Data under the Statistics tab to view the mean for each selected 30-year sub-set of data, until you find a period for which the mean is closest to zero.
- Evaluate the linear trend in global temperature anomalies over the full 160 year period (1850 - 2009) following the steps below. (A) Add a trend line to the plot of global temperature anomalies vs. time over the full 160 year period: use the Trend Lines tab. Determine the slope of the linear regression line, b, in ºC/century and the correlation coefficient, r. As a side exercise, calculate the overall warming trend in temperature over the 1850-2009 period by multiplying the slope by the number of years in the period. (B) Assess statistical significance of the linear trend in global temperature anomalies without considering autocorrelation: use the online Statistical Calculator tool from Lesson 2, Statistical Analysis Part 2 to calculate the p-value for the number of samples, N, and the correlation coefficient, r, from part A. Interpret the p-value. Use the standard error of the slope, Sb, to calculate 95% confidence interval for the slope as b ±2Sb, and report the calculated warming range in ºC/century. (C) Determine lag-one autocorrelation coefficient for the residuals, ρ. To do that, run the regression model using the Regression Model tab: select Model Parameters = year, Target Observation = Temp Anom. Plot the residuals: in the Plot Settings tab, select year for X and Model Residuals for Y. Check whether the autocorrelation is statistically significant using the Statistical Calculator. Remember that lag-one autocorrelation is simply correlation between the original data and an exact copy of the data but shifted by one time step, so the number of samples, N, is decreased by one; and the correlation coefficient, r, should be substituted by autocorrelation coefficient, ρ. Interpret the p-value. (D) Reassess statistical significance of the linear trend in the global temperature anomalies, taking into account autocorrelation of the residuals. Remember that in the presence of a significant autocorrelation, the actual number of samples, N, must be replaced by the degrees of freedom, N': use formula (2) from Lesson 2 to calculate N'. Use the Statistical Calculator to calculate the p-value for N = N' and r from part (A). Interpret the p-value. (E) Look closely at the plot of the residuals that you created in part (C). Do you see evidence of heteroscedasticity (additional structure superimposed on the random fluctuations)? Do you think that the hypothesis of a simple linear warming trend in this data series is appropriate or not?
- Test the hypothesis that there is a difference in trend over two sub-periods: the first 110 years and the final 50 years. Follow the steps below. (A) Use the Plot Settings tab to plot the entire data series and zoom in to select 1850-1959 sub-period. Select Viewport in the Trend Lines tab to perform linear regression for the sub-period. Determine the slope of the linear regression line, b, in ºC/century and the correlation coefficient, r. Use the standard error of the slope, Sb, to calculate 95% confidence interval for the slope in ºC/century. (B) Repeat the analysis for the 1960-2009 sub-period. (C) Determine whether trends are statistically different: you need to check whether the 95% confidence intervals for the two sub-periods overlap. Based on your results, has global warming accelerated over the past 50 years? What important caveat about the 95% confidence intervals was not taken into account in our analysis?
- Is there a statistically significant influence of El Niño on global temperatures? (A) Use the Plot Settings tab to plot global temperature anomalies (Temp Anom) over time (Plot #1) and then Niño 3.4 index (Niño) over time (Plot #2) on the same plot. Determine the number of years, N, in the time interval over which the two data series overlap. This is the number of samples you will use in the analyses below. (B) Plot the relationship between Niño 3.4 index and global temperature anomaly: use the Plot Settings tab to plot Niño 3.4 index (Niño) on the X-axis and global temperature anomalies (Temp Anom) on the Y-axis. Determine the slope of the linear regression line, b (in °C change in global temperature per a unit change in Niño 3.4 index) and the correlation coefficient, r. Assess statistical significance of the linear trend without accounting for autocorrelation. (C) Plot model residuals over time and discuss whether heteroscedasticity of the regression model residuals is a point of concern in your analysis.
- Given that there are La Niña year conditions this year, how do you expect ENSO to influence this year's global temperature? ENSO – El Niño-Southern Oscillation – is a climate pattern of oscillation between El Niño events (the positive phase, i.e., Niño 3.4 index above average) and La Niña events (the negative phase, i.e., Niño 3.4 index below average). Using the most negative observed value of the Niñ o 3.4 index during the historical time period used for the regression equation from question (5), determine the temperature anomaly predicted by the regression model for this most negative value. By comparing this temperature anomaly to the temperature anomaly expected in neutral (i.e., Niño 3.4 index = 0) conditions, estimate the largest perturbation of global temperatures expected in the most extreme La Niña events.
- Save your word processing document as either a Microsoft Word or PDF file in the following format: PS1_AccessAccountID_LastName.doc (or .pdf).
For example, student Elvis Aaron Presley's file would be named "PS1_eap1_presley.doc". This naming convention is important, as it will help the instructor match each submission with the right student!
Submitting your work
Upload your file to the "Problem Set #1" assignment in Canvas by the due date indicated in the Syllabus.
The instructor will use the general grading rubric for problem sets to grade this activity.
Solutions will be uploaded to Lesson 2: Climate Observations, Part 1 in Canvas after the due date.