The Nature of Geographic Information

6. Database Management Systems

PrintPrint

Claritas and similar companies use database management systems (DBMS) to create the "lifestyle segments" that I referred to in the previous section. Basic database concepts are important since GIS incorporates much of the functionality of DBMS.

Digital data are stored in computers as files. Often, data are arrayed in tabular form. For this reason, data files are often called tables. A database is a collection of tables. Businesses and government agencies that serve large clienteles, such as telecommunications companies, airlines, credit card firms, and banks, rely on extensive databases for their billing, payroll, inventory, and marketing operations. Database management systems are information systems that people use to store, update, and analyze non-geographic databases.

Often, data files are tabular in form, composed of rows and columns. Rows, also known as records, correspond with individual entities, such as customer accounts. Columns correspond with the various attributes associated with each entity. The attributes stored in the accounts database of a telecommunications company, for example, might include customer names, telephone numbers, addresses, current charges for local calls, long distance calls, taxes, etc.

Geographic data are a special case: records correspond with places, not people or accounts. Columns represent the attributes of places. The data in the following table, for example, consist of records for Pennsylvania counties. Columns contain selected attributes of each county, including the county's ID code, name, and 1980 population.

1980 Population Data for PA Counties
FIPS Code County 1980 Pop
42001 Adams County 78274
42003 Allegheny County 1336449
42005 Armstrong County 73478
42007 Beaver County 186093
42009 Bedford County 47919
42011 Berks County 336523
42013 Blair County 130542
42015 Bradford County 60967
42017 Bucks County 541174
42019 Butler County 152013
42021 Cambria County 163062
42023 Cameron County 5913
42025 Carbon County 56846
42027 Centre County 124812

Table 1.1: The contents of one file in a database.

The example is a very simple file, but many geographic attribute databases are in fact very large (the U.S. is made up of over 3,000 counties, almost 50,000 census tracts, about 43,000 five-digit ZIP code areas and many tens of thousands more ZIP+4 code areas). Large databases consist not only of lots of data, but also lots of files. Unlike a spreadsheet, which performs calculations only on data that are present in a single document, database management systems allow users to store data in, and retrieve data from, many separate files. For example, suppose an analyst wished to calculate population change for Pennsylvania counties between the 1980 and 1990 censuses. More than likely, 1990 population data would exist in a separate file, like so:

1990 Population Data for PA Counties
FIPS Code 1990 Pop
42001 84921
42003 1296037
42005 73872
42007 187009
42009 49322
42011 352353
42013 131450
42015 62352
42017 578715
42019 167732
42021 158500
42023 5745
42025 58783
42027 131489

Table 1.2: Another file in a database. A database management system (DBMS) can relate this file to the prior one illustrated above because they share the list of attributes called "FIPS Code."

If two data files have at least one common attribute, a DBMS can combine them in a single new file. The common attribute is called a key. In this example, the key was the county FIPS code (FIPS stands for Federal Information Processing Standard). The DBMS allows users to produce new data as well as to retrieve existing data, as suggested by the new "% Change" attribute in the table below.

Percent Change in Populations for PA Counties 1980-1990
FIPS County 1980 1990 % Change
42001 Adams 78274 84921 8.5
42003 Allegheny 1336449 1296037 -3
42005 Armstrong 73478 73872 0.5
42007 Beaver 186093 187009 0.5
42009 Bedford 47919 49322 2.9
42011 Berks 336523 352353 4.7
42013 Blair 130542 131450 0.7
42015 Bradford 60967 62352 2.3
42017 Bucks 541174 578715 6.9
42019 Butler 152013 167732 10.3
42021 Cambria 163062 158500 -2.8
42023 Cameron 5913 5745 -2.8
42025 Carbon 56846 58783 3.4
42027 Centre 124812 131489 5.3

Table 1.3: A new file produced from the prior two files as a result of two database operations. One operation merged the contents of the two files without redundancy. A second operation produced a new attribute--"% Change"--dividing the difference between "1990 Pop" and "1980 Pop" by "1980 Pop" and expressing the result as a percentage.

Database management systems are valuable because they provide secure means of storing and updating data. Database administrators can protect files so that only authorized users can make changes. DBMS provide transaction management functions that allow multiple users to edit the database simultaneously. In addition, DBMS also provide sophisticated means to retrieve data that meet user specified criteria. In other words, they enable users to select data in response to particular questions. A question that is addressed to a database through a DBMS is called a query.

Database queries include basic set operations, including union, intersection, and difference. The product of a union of two or more data files is a single file that includes all records and attributes, without redundancy. An intersection produces a data file that contains only records present in all files. A difference operation produces a data file that eliminates records that appear in both original files. (Try drawing Venn diagrams--intersecting circles that show relationships between two or more entities--to illustrate the three operations. Then compare your sketch to the venn diagram example. ) All operations that involve multiple data files rely on the fact that all files contain a common key. The key allows the database system to relate the separate files. Databases that contain numerous files that share one or more keys are called relational databases. Database systems that enable users to produce information from relational databases are called relational database management systems.

A common use of database queries is to identify subsets of records that meet criteria established by the user. For example, a credit card company may wish to identify all accounts that are 30 days or more past due. A county tax assessor may need to list all properties not assessed within the past 10 years. Or the U.S. Census Bureau may wish to identify all addresses that need to be visited by census takers, because census questionnaires were not returned by mail. DBMS software vendors have adopted a standardized language called SQL (Structured Query Language) to pose such queries.

Penn State logo
This textbook is used as a resource in Penn State's Online Geospatial Education online degree and certificate programs. If this topic is interesting to you and you want to learn more about online GIS and GEOINT education at Penn State, check out our Geospatial Education Program Office.