Whether it is a single geographic position of a movie-goer checking in at her favorite restaurant or the locations of thousands of animals equipped with GPS transmitters in a wildlife refuge, every GIS project and application is driven by data.
Data, generally, can be considered to be “values” of “variables”; the variables are the kind of phenomenon or its attributes that are measured, and the values can be numerical (e.g., the population of a city) or categorical (e.g., whether a highway is an Interstate or a U.S. route). When used in a computer system, these data must be in a form suitable for storage and processing. Data can represent all types of information and may consist of numbers, text, images, and many other formats. If you have an online profile, it probably asked you to enter a name, e-mail address, photo, or phone number. These categories are data variables, and what you enter are the data values.
People create and study data as a means to help understand how natural and social systems work. Such systems can be hard to study because they're made up of many interacting phenomena that are often difficult to observe directly, and because they tend to change over time. We attempt to make systems and phenomena easier to study by measuring their characteristics at certain times. Because it's not practical to measure everything, everywhere, at all times, we measure selectively. How accurately data reflect the phenomena they represent depends on how, when, where, and what aspects of the phenomena were measured. It is important to keep in mind that all measurements contain a certain amount of error; the types of error, along with the concepts of accuracy and precision, will be discussed later. For now, however, we will focus on the characteristics of data and how data relate to information.
When phenomena are measured, one or more variables are recorded. As we have mentioned, recorded variables might consist of numerical values, names, or even pictures. All of these are referred to as variables, since they are only representations of the phenomena and may consist of several different values of the same type. Once collected, the variables can be treated as-is or combined and recalculated to form additional representations of the phenomena.
Encoding data in a form that can be reproduced on a computer facilitates storing these data components, sharing them with others, and adding them to structured collections, known as databases. Regardless of the type of data, computers follow instructions to convert data into various formats that are ultimately represented in binary form by series of ones and zeros, or bytes. Although the conversion of digital data to binary representations is beyond the scope of this course, it is important to remember one simple fact: if we can instruct computers to store digital data in this way, we can alter these instructions to make changes to the data. The ability to manipulate, combine, and process data is what allows us to turn a collection of measurements into information that can be used to answer specific questions.
Information is data that has been selected or created in response to a question. For example, the location of a building or a route is data, until it is needed to dispatch an ambulance in response to an emergency. When used to inform those who need to know "where is the emergency, and what's the fastest route between here and there?," the data are transformed into information. The transformation involves the ability to ask the right kind of question, and the ability to retrieve existing data--or to generate new data from the old--that help people answer the question. The more complex the question, and the more locations involved, the harder it becomes to produce timely information. As a result, advancements in both computer software and hardware devices that can collect, integrate, and process large volumes of data quickly have become critical assets in the geospatial industry.
Geographic data and the information derived from it have become valuable commodities. Interestingly, in contrast to a commodity such as corn, the potential value of data is not lost when they are used. Data can be transformed into information again and again, provided that the data are kept up to date. Given the rapidly increasing accessibility of computers and communications networks in the U.S. and abroad, it is not surprising that data and information have become commodities, and that the ability to produce both has become a major growth industry.
When it comes to information, “spatial is special.” Reliance on spatial attributes is what separates geographic information from other types of information. Goodchild (1992) points out several distinguishing properties of geographic information. These properties are paraphrased below. Understanding them, and their implications for the practice of geographic information science, is a key objective of this course.
- Geographic data represent spatial locations and non-spatial attributes measured at certain times.
- Geographic space is continuous.
- Geographic space is nearly spherical.
- Geographic data tend to be spatially dependent.
The next section will clarify some of these properties and prepare you to understand the others as you progress through the course.
Registered Penn State students should return now to the Chapter 1 folder in Canvas to take a self-assessment quiz about the Data and Information.
You may take practice quizzes as many times as you wish. They are not scored and do not affect your grade in any way.