GEOG 868
Spatial Database Management

Aggregating Data

PrintPrint

Aggregating Data

One of the beauties of SQL is in its ability to construct groupings from the values in a table and produce summary statistics for those groupings. In our baseball example, it is quite easy to calculate each player's career statistics, seasons played, etc. Let's see how this type of query is constructed by calculating each player's career RBI total.

A. Adding a GROUP BY clause to calculate the sum of values in a field

  1. Return to Design View (your Query 1 design should still be there) and remove the LAST_NAME and FIRST_NAME fields from the design grid.
  2. With the Design tab selected, click on the Totals button on the right side of the Ribbon in the Show/Hide group of buttons. This will add a Total row to the design grid with each field assigned the value Group By.
  3. Click in the Total cell under the RBI column and select Sum from the drop-down list (note the other options available on this list, especially Avg, Max, and Min).
  4. Grouping by both the player's name and the year won't yield the desired results, since the data in the STATS table are essentially already grouped in that way. We want the sum of the RBI values for each player across all years, so remove the YEAR column from the design grid and remove the >99 from the Criteria row.
  5. Modify the query so that the results are sorted from highest career RBI total to lowest.
    Screen capture of Aggregate SumRBI, sorted from highest career RBI total to lowest.
  6. Run the query and note that the RBI column name defaults to 'SumOfRBI'. You can override this by returning to Design View and entering a custom name (such as CareerRBI) and a colon before the RBI field name (following the PLAYER field example).

B. Use the Expression Builder to perform a calculation based on multiple field values

Home run hitters are often compared based on the frequency of their home run hitting (i.e., 'Player X averages a home run every Y times at bat'). Let's calculate this value for each player and season.

  1. Click on the Totals button on the query Design toolbar to toggle off the Total row in the design grid.
  2. Remove the RBI field from the design grid.
  3. Add the YEAR field to the design grid.
  4. In the third column (which is empty), right-click in the Field cell and click on Build to pull up the Expression Builder as done earlier.
  5. Navigate inside the database and click on STATS to see a list of its fields.
  6. Use the GUI to build the following expression:
     
    ABPERHR: [STATS]![AB]/[STATS]![HR]
  7. If a player had 0 HR in a season, this calculation will generate an error (can't divide by 0, test the query and look for #Div/0!), so add the HR field to the design grid, add >0 to its Criteria cell and uncheck its Show checkbox.
  8. Sort the query results (not shown below) by the ABPERHR value from lowest to highest (since low values are better than high values).
    Screen capture to show Aggregate_ABPERHR, and position of GUI to build the expression.
  9. Run the query and note that the calculated values include several digits after the decimal point. These digits aren't really necessary, so return to Design View, right-click on the ABPERHR field and select Properties.
  10. A Property Sheet pane will appear on the right side of the window. Select Fixed from the Format drop-down list and set the Decimal Places value to 1.
  11. Re-run the query and confirm that you get the desired results.