Weather and Climate Data Sets

METEO 810: Weather and Climate Data Sets


Quick Facts about METEO 810

Anticipating weather events first requires an understanding of typical (or expected) conditions at a particular site. Such climatologies are constructed primarily from historical observations but may also include numerically derived forecasts and analyses. In this course, you will learn a variety of methods for accessing appropriate weather and climate data sets available from government and research institutions. Working with very large data sets in a computationally efficient manner will be stressed, as will consideration of factors that affect data reliability. You will be encouraged to consider numerous possibilities for presenting weather and climate data with a minimum of quantitative analysis. In addition, numerous examples and case studies will augment discussions on such topics as numerical reanalysis data sets, self-describing archives, and typical problems encountered with environmental observations. Finally, you will learn to construct a site-specific or regional climatology and to communicate a qualitative analysis of those data to others.

Course Overview

METEO 810 is a professional, graduate-level course offered by the Department of Meteorology and Atmospheric Science. The course is designed specifically for distance learners who are interested in learning about weather and climate data sets.

The Garden Wall Weather Station in Glacier National Park
The Garden Wall Weather Station is situated below the Garden Wall and adjacent to the Haystack Creek avalanche path in Glacier National Park. It provides meteorological data for avalanche forecasting and research, including wind speed and direction, air temperature, relative humidity, and net radiation measurements.
Credit: USGS

Why is weather data important?

A recent CNBC article, “The Sexiest Job of the 21st Century: Data Analyst,” described the demand for data analytic specialists,—sometimes called data scientists, who know how to manage the tsunami of information, spot patterns within it, and draw conclusions and insights—as nearing a frenzy.  This is due in part to the availability of massive data sets, now accessible to companies and government organizations for the first time due to cheap IT storage and increasing processing power.  Perhaps one of the largest sources of untapped big data is the weather.  Every day, over 6 TB of observational weather data is collected by the National Centers for Environmental Prediction. They, in turn, produce 1.5 TB of output in the form of 15 million operational products.

Bill Pardue, CEO of Weather Analytics, estimates that, “A third of U.S. commerce is sensitive to the weather.” (Forbes, 2013)  This has led to the growth of numerous companies (Weather Analytics, Planalytics, Weather Trends, Climate Corporation, etc) that supply weather data -- both raw and analytics – to businesses and governmental organizations. These companies boast many of the U.S. Fortune 500 companies as their clients. 

The question, therefore, becomes: can education be provided to the thousands of data analytics professionals in these companies so they can access, analyze, and manipulate atmospheric data sets on their own?  The Department of Meteorology and Atmospheric Science at Penn State believes the answer to this question is “Yes.”  This program would be ideal for anyone who works with historical or forecast data in a weather-sensitive sector.  The positions might include Marketing and Sales Analysts, Statisticians, Business Intelligence Analysts, Risk Analysts, Logistics Managers, and IT professionals.

In “How to Get a Hot Job in Big Data” (InfoWord, 2010), Michael Dsupin, CEO of tech staffing firm Talener is quoted as saying, "Marketing and research people are becoming adept at pulling data from one system, translating it, and loading it into another system.”  Our program will teach these types of individuals to add weather data streams to their existing analysis routines.

Course Objectives

METEO 810 seeks to give you a better understanding of weather and climate data sets. After successfully completing this course, you will be able to:

  • identify various sources from which to collect global weather and climate data;
  • choose weather and climate data types appropriate to a desired observation or metric;
  • manipulate large data sets in order to focus on key aspects as defined by an external problem;
  • display weather and climate data in a manner that effectively communicates answers to posed questions;
  • describe both knowable and unknowable sources of error in environmental data sets (as well as suggest solutions to combat both); and
  • exhibit a global perspective on the challenges and opportunities of incorporating weather and climate information into decision-making processes over a wide range of business and governmental sectors.

What will you learn in this course?

Lesson 1: Meteorological Data Collection Methods (time standards, remote vs. in situ data, surface observing systems, satellite observing systems, radar measurements of precipitation and large hail, upper-air observing systems, questions to consider when evaluating observational data)

Lesson 2: Crafting a Message (asking the right questions, using graphs vs. tables, line graphs vs. bar graphs, comparison graphs, histograms and box-plots, 2D data visualization)

Lesson 3: An Introduction to "R" (installation of R and R-studio software, introduction to vector arithmetic, basic R functions, importing data, basic R plots, histograms and box-plots, contour and image plots, using custom libraries)

Lesson 4: Historical Data Sets (data portals, NCEI data sets, retrieving data with RNOAA, automated data retrieval and data availability maps, the integrated surface data set, asking the right questions when retrieving data)

Lesson 5: Data From Numerical Models (introduction to numerical forecasting, types of numerical models, retrieving data with RNOMAD, parsing the 5-dimention model data set, introduction to ensembles, using the RNOMAD ensembles)

Lesson 6: Decoding NetCDF and WGRIB formats (retrieving and processing NetCDF and WGRIB data, finding the right library, reading headers, extracting data, introduction to the NARR and other reanalysis products)

Lesson 7: Taming Unruly Data (data problems, checking for corrupted data, dealing with missing values, introduction to data transformation, introduction to smoothing techniques (moving average, splining, lowess), down-scaling (nearest point, interpolation))

How does this course work?

As with most graduate courses, there is a considerably higher onus on you to take responsibility for your own learning. While lessons present guidance on what you need to learn, much of your actual learning will take place as you experiment with various examples presented in the text. Following through on these examples and exploring various ways to accomplish proscribed data-procurement tasks are an absolute necessity, not only to be successful on the lesson's assessment activity but to meet your own learning goals as well. We strongly recommend that all students have some experience with a programming language. In this course, we will use the open-source (free) statistical programming language "R". R is fairly straightforward to learn (certainly at the level that we will start off with). However, you should be familiar with the tenets (and basic skills) of computer programming. Please check out the "Pre-Enrollment" link at the top of this page for more information.