In this first part of the lesson, you'll get an introduction to Postgres's graphical interface called pgAdmin. You'll also import a shapefile, load data from a text file, and see how queries are performed in pgAdmin.
A. Create a new schema
- Open pgAdmin 4. The application should open with a Browser pane displayed on the left side of the window. Within the Browser you should see a tree with Servers at the top. Beneath Servers you should see a Postgres 9.6 (x86) server.
- Double-click on that server to open a connection to it. You will be logging in with the default user name of postgres.
Enter the password you defined earlier for the postgres account, when you installed the software. You should now see 3 nodes beneath the localhost server: Databases, Login/Group Roles and Tablespaces.
- Expand the Databases list. You should see at least one "starter" database: postgres. It was created when you installed Postgres.
We want to create a new database that is specific to our desire to use the PostGIS functionality.
- Right-click on the Databases list and choose Create > Database.
- In the Create - Database dialog set the Database to Lesson3db, and from the Owner list select the postgres user name.
- Now click on the Lesson3db database to expand its list of contents.
Right-click on Extensions, and select Create > Extension.
- In the Create Extension dialog under the General tab set the Name to postgis.
In the same dialog, select the Definition tab and set the Version to 2.3.3.
(The settings you just established are reflected under the SQL tab.)
Click Save to dismiss the Create Extension dialog.
With the Lesson3db still expanded view the rest of the objects associated with the database.
We will next concern ourselves with schemas. In Postgres, schemas are the containers for a set of related tables. Generally speaking when you begin a new project you'll want to create a new schema.
- Expand the Schemas list. At this point you should see only one schema: public. We'll have a look at the public schema soon but for now let's create a new schema. This schema will store data for the United States that we will use in the next two lessons.
- Right-click on Schemas and select Create > Schema.
- In the Create Schema dialog, specify a name of usa.
- Set the Owner of the schema to postgres.
- Click Save to create the schema.
B. Load data from a shapefile
A common workflow for PostGIS users is to convert their data from Esri shapefile format to PostGIS tables. Fortunately some PostGIS developers have created a Shapefile Import/Export Manager that makes this conversion easy. In prior versions of pgAdmin, the shapefile importer was accessible as a plug-in. In pgAdmin 4, it must be run as a separate application.
- In Windows Explorer, browse to the following folder:
C:\Program Files (x86)\PostgreSQL\9.6\bin\postgisgui
- Run the executable shp2pgsql-gui.exe.
If you encounter an error that the file "libintl-8.dll is missing," the easiest fix for this problem is to navigate up to the bin folder where libintl-8.dll is found, copy it and paste it into the postgisgui folder.
Since we will be using this executable several times, I suggest that you make a desktop shortcut for it.
- At the top of the application window, click the View connection details button.
- Confirm that the PostGIS Connection parameters are set as follows:
Password: <the password you set when installing the software>
Server Host: localhost (port 5432)
- In the middle of the dialog, click the Add File button and navigate to the location of your Lesson 3 data.
- Select States.shp and click Open.
- In the Import List section of the dialog (above the Add File button) supply the following settings by clicking on the current values under each heading:
Geo Column: geom
SRID: 4269 (After setting this last value be sure to click away from the row of specifications, otherwise the last value changed may revert to its original value.)
Before performing the import, let's spend a moment discussing the SRID (Spatial Reference IDentification) setting. This ID is set to 0 by default, a value that indicates the spatial reference of the shapefile is unknown. As with other GIS applications, defining a dataset's spatial reference is critical in enabling most of the functionality we typically need. We'll talk more about SRIDs later. For now it's sufficient to know that 4269 is the ID associated with the decimal degree/NAD83 coordinate system used by the Lesson 3 data.
- Click Import. After just a moment the Log Window area of the dialog should report that the import process has been completed.
- Close the import/export manager dialog by clicking the X button in the upper right of the dialog or on the Cancel button.
- Back in pgAdmin, expand the object list associated with the usa schema.
- Click on Tables. You should now see the newly imported states table.
It's sometimes necessary to refresh the GUI after creating new objects like this. This can be done by right-clicking on the schema or Tables node in the Browser and selecting Refresh (or hitting F5 on the keyboard).
- Right-click on the states table and select View/Edit Data > First 100 Rows. Note the other options in this context menu which are rather straightforward to understand.
While looking at the table note that the column headers include not just the names of the columns but also their data types. The gid column is an auto-incrementing integer column that was added by the importer. The presence of [PK] in its header indicates that it was also designated as the table's primary key.
You may be wondering why the geom column shows values for some rows, but not others. This is not something to be concerned about. The geometries are stored properly but most are simply too long to be displayed in the table.
- Repeat these steps to import the us_cities shapefile. Truncate its name to just cities.
C. Create a new table
Loading data from a comma-delimited text file is a common workflow for database developers. Let's see how this can be done in Postgres by loading some state demographic info from the 2010 Census. We'll begin by creating a new blank table.
- To create a new table, right-click on Tables under the usa schema and select Create > Table.
- Under the General tab, set the table's Name to census2010 and the Owner to postgres.
- Under the Columns tab, click the + button. You should see an entry appear for setting the new column's properties.
- Set the column's Name to state, its Data type to character varying and its Length to 50. Finally, set its Primary key value to Yes.
- Repeat the last two steps to add a column with the Name total and Data type of integer. The length property need not be set for the integer type and the column should not be defined as the primary key.
- Add the following additional columns to the table, all as integer data type.
Instead of having to expand the Data Type pick list you can start typing the word integer and the slot will let you auto fill with choices. After you type inte you can pick "integer." Be sure to add the columns in this order, otherwise the data load will not work properly.
Table 3.1: Additional Integer columns to be added to your usa.census2010 table.
male female white black amind asian hawaiian other mixed
- Click Save to finish creating the table.
D. Load data using the COPY command
Before executing the command that will import the data into the table let's have a look at the data file in a plain text editor and also note its location.
- In Windows Explorer right-click on the census2010.csv file and select Open With > Notepad. Note that the values in the file are comma delimited and that the file includes a header row. Close the file when finished examining it.
- Determine the full path name of the location of your census2010.csv file. The format that you will need to use is as follows with the drive letter followed by the folder names and finally the file name. Something like C:\Users\Smith\Documents\Lesson3data\census2010.csv
- In pgAdmin, select Tools > Query Tool. The Query Tool dialog is built such that the SQL code is entered in the top and the query's output appears in the bottom.
- Enter the following command, changing the path to reflect the census2010.csv file location on your machine. Note, there are no spaces in the ' " ' string.
COPY usa.census2010 FROM 'C:\PSU\Geog897D\Lesson3data\census2010.csv' WITH (FORMAT csv, HEADER True, QUOTE '"');Let's look for a moment at the options set in the WITH clause. If our input file were tab-delimited we would use a FORMAT setting of text rather than csv. We set the HEADER option to True since our file contains a header row. And we set the QUOTE option to the double-quote character to indicate that the input file encloses text strings with that character. A number of other options are available and can be found on the COPY command's page in the documentation. Among the other options that you may need to set is DELIMITER. This defaults to the comma for csv files and to tab for text files. If your file uses another delimiter, such as the pipe character (|), you can indicate that using the DELIMITER option.
The COPY command attempts to insert values from the first column of the input file into the first column of the table, values from the second column of the input file into the second column of the table, etc. The HEADER option simply tells Postgres to skip the first line, not to read the column headers, and intelligently match the columns of the input file to the columns of the table. If your table happens to have more columns than the input file and/or the columns are in a different order you can deal with this by supplying to the COPY command a list of column names that matches the input file after the table name. For example:
COPY usa.census2010 (state, total, male....) FROM ....
- Execute the command by clicking the Execute button (lighting bolt icon) just above the SQL code box. You should receive a message that the query returned successfully with 52 rows affected.
If you encounter a "permission denied" error, it means the "postgres" database login doesn't have permission to read the csv file where it is currently located. Try copying it to a sub-directory belonging to the "Public" user (e.g., 'C:\Users\Public\Public Documents') or to a location that has no permission restrictions (e.g., 'C:\temp'). You could also reset the permissions on the folder that stores the CSV file as outlined in this stackoverflow thread:
- Confirm that the data loaded properly using the method you used for the states table.
E. Write queries in pgAdmin
- Click on Tools > Query Tool again to open a new query tab.
- In the SQL box enter the following query to identify the states where most of the population uses the term 'Soda' when referring to soft drinks:
SELECT name, sub_region FROM states WHERE sub_region = 'Soda';
- Run the query by clicking the Execute (lighting bolt) button on the toolbar. You should receive an error message that the relation "states" does not exist.
- The reason for this error has to do with pgAdmin's search path. Among other things the search path determines which schema(s) will be scanned when tables are specified using unqualified names (e.g., like we just did with "states"). There are two solutions to this problem. The first is to qualify all table names with their parent schema. For example:
SELECT name, sub_region FROM usa.states WHERE sub_region = 'Soda';The second solution is to reset pgAdmin's search path so that the schema you're using is part of that path. By default, pgAdmin searches only the public schema. We will take this second approach since it allows us to omit the schema qualifier.
- So, highlight the text of your query and cut it out of the editor window. We'll be pasting it back in momentarily.
- Enter the following statement into the SQL Editor:
SET search_path TO usa, public;
- Run this query by clicking the Execute button. You should receive a message that the query returned successfully, though you should expect no tabular output from a query like this. pgAdmin will now look for unqualified tables first in the usa schema, then in the public schema. We include the public schema because the search path is used not just for searching for tables but also for functions. When we move on to spatial queries we'll need to have access to some of the functions available in the public schema.
If you are curious you can run the following query to find out what the search path is set to.
- You can now retry the query that you cut out. You should see a list of 17 states in the Output pane at the bottom of the window.
You may still be receiving an error if you left the table's name set to States rather than states during the import process; pgAdmin converts all table/column names to lower-case prior to execution by default. Thus, even if your FROM clause reads "FROM States", it will be evaluated as "FROM states". And if your table is named States, pgAdmin won't find a matching table. To override this case conversion, you can put the table/column name in double quotes like this:
SELECT name, sub_region FROM "States" WHERE sub_region = 'Soda';
To avoid having to qualify your table/column names in this way, it's best to use lower case in your naming.
- While on the subject of case you may have noticed that my examples place all SQL keywords in upper case and table/column names in lower case. This is a convention that is followed by many SQL developers because it makes it easy to tell at a glance which parts of the statement are SQL keywords and which are schema elements. This is just a convention not a requirement so you should feel free to deviate from it if you prefer. For example this query will produce the same results:
select name, sub_region from states where sub_region = 'Soda';
Now that you have a feel for how Postgres works go on to the next page to practice writing queries in pgAdmin.