Lesson 7: Data Management in an Enterprise Geodatabase

Overview

Now that you've gotten your feet wet with an enterprise geodatabase, it's time to dig into some of the details of managing vector and raster data in that environment. In this lesson, you'll learn how to add new feature classes and raster data sets to an enterprise geodatabase and about steps that should be taken after the creation of new datasets. You'll also see how geodatabase data is stored within SQL Server.

Questions?

If you have any questions now or at any point during this week, please feel free to post them to the Lesson 7 Discussion Forum.

Checklist

Lesson 7 is one week in length. See the Canvas Calendar for specific due dates. To finish this lesson, you must complete the activities listed below. You may find it useful to print this page out first so that you can follow along with the directions.

Download a .zip archive containing four shapefile datasets that we used back in Lesson 3 and in Project 4. DataFromLessons3and4.zip [1]
Download some new data that you'll need for Lesson 7. Lesson7_data.zip [2].
Work through Lesson 7.
Complete the Charlottesville historical geodatabase compilation project and write-up described on the last page of the lesson.
Take Quiz 7.

Feature Class Creation: Via Import

In the previous lesson, we saw how to import vector data into new geodatabase feature classes as part of our experimentation of logins, users and roles. Most of this section of Lesson 7 will be reviewed as we import more data, though the process is outlined in greater detail here.

A. Transferring data from your computer to your enterprise geodatabase instance

We are going to transfer four shapefile datasets from your computer to your Amazon cloud enterprise geodatabase instance. The shapefiles are data you worked with back in Lessons 3 and 4: the States, counties, and us_cities shapefiles that we used in Lesson 3, and also the cities shapefile that you used in Project 4 in the Jen and Berry's site selection exercise. I gave you copies of them in the DataFromLessons3and4.zip archive that you downloaded in the Checklist section, above.

Restart your instance.

You are next going to open a Remote Desktop Connection to your geodatabase instance. If you allocated an Elastic IP for your instance as instructed in Lesson 6, then you can connect using that same address. Otherwise, you'll need to connect using the new Public IP assigned to your re-started instance.

Also, if the IP address you're connecting from has changed, then you'll need to refresh the RDP Security setting by going to your instance's security group and updating the Inbound RDP rule.
Open a Remote Desktop Connection to your geodatabase instance through the appropriate IP address.
In doing so, make certain that the box for Drives is checked via Options > Local Resources > More button.
In your remote desktop connection window, open Windows Explorer (This OS version calls it File Explorer). There should be an icon for it on the taskbar along the bottom of the remote connection window. If not, wend your way via the Start button to it.

Click This PC to display the drives that are attached to the Windows operating system.
Under Devices and drives, you will see listed a set of drives (Local Disk C and possibly also Local Disk D) for your EC2 instance.
You will also see the drives that reside on the computer you are remote-connecting from. They will have names like C on <the name of your computer>.

In the next steps, you are going to Copy-Paste the files making up the four shapefiles from your computer to the Local Disk (C:) drive on your instance.
So, you might want to open a second File Explorer window.

To open another File Explorer window, right-click on the File Explorer icon on the taskbar, and choose File Explorer.
Copy the DataFromLessons3and4.zip archive from your local machine to the virtual machine in your remote instance.
- Browse to the folder on your local computer that contains the DataFromLessons3and4.zip archive.
  Highlight it, and click Copy.
- Now (via your second File Explorer window), navigate into the Local Disk (C:) drive on your remote desktop connection. Create a new folder named data on the C: drive.
  Browse to within the data folder, and Paste the shapefile zip archive file there.
  The copy/paste process may take several seconds.
Un-zip the DataFromLessons3and4.zip archive.
Leave the uncompressed file destination the name of the zip archive: DataFromLessons3and4, so that the data ends up in a folder by that name.
In your remote connection, you may choose to open your ArcGIS Pro project from Lesson 6 to re-use the Folder connection to the Local Disk (C:) drive. Otherwise, you can re-create it in a new project.
Expand the folder connection to the C:\ drive, and make certain that in your DataFromLessons3and4 folder you have four shapefile datasets — States, counties, and us_cities from Lesson 3, and cities from Project 4.

To learn more about issues concerning moving data into the cloud, go to the Moving data to the cloud [3] section of our Cloud and Server GIS course, and read from the beginning of that section through the Techniques for data transfer subsection.

B. Importing to your database from a shapefile or file/personal geodatabase

A common workflow for organizations that are migrating their data to an enterprise geodatabase is to import data that already exist in other formats. Let's walk through that process.

You should already have opened ArcGIS Pro. In the Catalog pane, you should have some connections under the Databases heading left over from the previous lesson, including:

dbo_egdb
census_egdb
transportation_egdb

Recall that the latter two connections were through database users we created for the specific purpose of loading data.

We're going to import the States, counties, and us_cities shapefiles as feature classes.

We'll first create a feature dataset to house the new feature classes.
Right-click on the census_egdb.sde connection heading, and select New > Feature Dataset.
Assign a name of usa_L7.
For the Coordinate System parameter, perhaps the easiest way to set it is to import the coordinate system definition from one of the feature classes you plan to house in the dataset.
Click the globe icon to open the Coordinate System dialog. Select Import Coordinate System from the Add Coordinate System dropdown list on the right side of the dialog (), then browse to and select the C:\data\DataFromLessons3and4 States shapefile and click OK.

Click OK again to dismiss the Coordinate System dialog.

After a few moments, you should see the new usa_L7 feature dataset listed beneath your census_egdb connection. Note that the full name listed in the Catalog window is egdb.CENSUS.usa_L7, indicating that census is the owner of the feature dataset.
Right-click on the usa_L7 feature dataset, and select Import > Feature Class. We'll use the multiple feature class option in a moment. The single option provides slightly different functionality.
For the Input Features browse to the location of the States shapefile.

The Output Location should already be filled in with the usa_L7 feature dataset accessed through the census_egdb connection.
For the Output Feature Class, assign the name states.

Though we won't use them, the next two optional settings are worth a brief mention. The first, Expression, allows for specifying a subset of the input dataset to import through an SQL Where clause. The second, Field Map, allows for some control over the fields that will be imported to the GDB feature class. For example, you might choose not to import certain fields. The Field Map parameter also makes it possible to produce an output field based on values from multiple input fields (e.g., the concatenation of two text values or the sum of two numeric values).
Click Run to begin the import process. After a few moments, you should see the states feature class listed in the Catalog pane and added to the map.

Now, let's use the multiple feature class import option to import the counties and us_cities shapefiles.
Right-click on the usa_L7 feature dataset, and select Import > Feature Class(es). As the name implies, this tool makes it possible to import a list of shapefiles, coverages, or feature classes. It doesn't provide the flexibility of importing a subset of records or customizing the output fields, so you'd use this tool only if you have no need for those options.
For the Input Features, browse to and select the counties and us_cities shapefiles – you can hold the Ctrl key to select them both at the same time..

The Output Geodatabase should already be defined correctly, as your usa_L7 feature dataset.
Click Run to begin the import process.

This import is likely to take a bit of time because of the volume of vertices in the counties shapefile.

Unlike the single feature class tool, it will not automatically add the new feature classes to the map.

Also, because you already added a us_cities feature class in Lesson 6, ‘_1’ will be added to the us_cities feature class you're importing now. Regardless of the fact that this feature class is in a different feature dataset, it is being stored as a table in the same SQL Server schema, so it must have a different name.

Of course, it's also possible to import stand-alone (non-spatial) attribute data tables to your geodatabase. We didn't see options for that above because we had right-clicked on a feature dataset, which is designed to store only feature classes (tables with geometry).
Right-click on your census_egdb connection (made back in Lesson 6), and hover your mouse over the Import menu. You should now see options for importing a single or multiple tables.

Recall from the last lesson that users created using the Create Database User tool are able to load data and create new feature classes from scratch, and that it's considered a best practice to limit these capabilities to administrative staff. We used SQL Server Management Studio to create users who we didn't want to grant data loading ability.

Speaking of creating new tables from scratch, let's take a look at that workflow in the next section.

Always remember to Stop your EC2 Instance when you finish or when you take a long break.

Feature Class Creation: From Scratch

In this section of the lesson, we'll look at creating a new feature class and populating it using the Append tool. To illustrate the process, imagine you're again working for Jen and Barry.

A. Create a new data loader

Have your enterprise geodatabase instance running and be connected to it via remote desktop connection.
Open ArcGIS Pro.

We're going to create a new user (jb) to be the data owner for the Jen and Barry's data (that we borrowed from Lesson 3).
Open the Create Database User tool that we used in Lesson 6 (Analysis > Tools > Data Management > Geodatabase Administration).
Set the Input Database Connection parameter to your dbo_egdb.sde connection.

As we did in the last lesson, leave the Create Operating System Authenticated User box unchecked.

Set the Database User and Database User Password to jb.

Set the Role to editor. (Recall that we created this role last week.)

Click Run.
Now, use New Database Connection to create a connection to the egdb database through the new jb user. Name the connection jb_egdb.

B. Create a new empty feature class

Access your new jb_egdb connection.
Create a new feature dataset called pa (for Jen and Barry's Pennsylvania data).
Import the coordinate system definition from the Jen and Barry's cities shapefile.
Right-click on the pa feature dataset, and select New > Feature Class. You now need to work through a multi-panel dialog.

Assign a name of cities to the feature class.

Set its feature type to Point Features, and click Next to move on to the next panel.

In the Fields panel of the dialog, add the following fields to the feature class:

cities feature Class Fields
Name	data type
population	Long Integer
total_crim	Long Integer
crime_inde	Double
university	Short Integer

We'll be accepting the default values for the rest of the settings. You can have a look at the other settings by clicking Next or you can click the Finish button.

New features can be added to the feature class using Pro's editing tools, which were covered in detail in GEOG 484. Another way to populate a feature class is by using the Append tool.

C. Load data with the Append tool

As its name implies, the Append tool is used to append features held in feature classes/shapefiles to another existing feature class. Let's use it to append the features in our Jen and Barry's cities shapefile to the empty cities feature class we just added to our geodatabase.

Open the Geoprocessing pane.
Open the Append tool found at Data Management Tools > General > Append.

For Input Datasets, browse to your Jen and Barry's cities shapefile. (In the DataFromLesson3and4 folder.)

For Target Dataset, browse to your egdb.JB.cities feature class. (In the egdb database.)

The next part of the dialog is concerned with whether the fields in the input dataset match the fields in the target dataset. The default must match option checks to see if the fields match (in name and data type) and will not allow the append operation to occur if there is a mismatch. The Use the field map option allows for differences between the datasets.
Because our cities feature class doesn't have all of the fields found in the cities shapefile, select the Use the field map option.

Note that the four fields we defined above when creating the feature class are listed under the Output Fields heading. For each field, we have the ability to, as mentioned on the previous page, populate it in advanced ways, such as by concatenating values from multiple source fields or by summing values from multiple source fields. We're just going to do a straight 1:1 transfer of values, so you can leave the default Field Map settings in place.

Click Run to carry out the append operation.

Points should appear on the map, and of course, the attribute table of the egdb.JB.cities feature class will become populated with data.

A couple of notes on the Append tool that you should keep in mind:

You're not limited to using this tool to populate a new empty feature class, as we did here. Features can be appended to any existing feature class.
The real power in the Append tool which we didn't see here is that it allows for the selection of multiple input datasets. So if you had a number of similar shapefiles (e.g., census tracts for three states in three separate shapefiles), you could use the tool to combine all of the features into one feature class.

Always remember to Stop your Instance when you finish or when you take a long break.

After Loading Data

Esri recommends the following after loading data into a geodatabase feature class:

Calculate database statistics using the Analyze tool.
Grant privileges for other users to access the data.
Update metadata.
Add geodatabase behavior (domains, sub-types, topology rules, etc.).

The second item above, which we covered last lesson, is the only one that is absolutely critical. The first item, which we'll discuss in a moment, can greatly improve performance, especially as the size of the feature class increases. Metadata, covered in GEOG 484 is often overlooked, but can save a lot of headaches for anyone who has questions about the data. Geodatabase behavior functionality, covered in GEOG 484 and in Lesson 5 in this course, offers useful ways to improve the efficiency and accuracy of data maintenance workflows.

To this list, I would add the implementation of attribute and spatial indexes to improve performance. This page of the lesson will focus on database statistics and indexes.

A. Calculating DBMS statistics using the Analyze tool

Relational database packages like SQL Server provide users with the ability to calculate basic statistics on their tables, such as the common values and data distribution in each column. These statistics are stored in system tables that are utilized by the DBMS to determine the best way to carry out queries, thereby improving performance. As a table's data changes over time, the statistics will become out of date and less helpful in optimizing performance. This is why Esri recommends running the Analyze tool after major edits are made to a feature class. Let's run the tool on the states feature class we imported earlier.

Note: There are actually two closely related tools that can be used to calculate DBMS statistics: Analyze (in Data Management > Table) and Analyze Datasets (in Data Management > Geodatabase Administration). Both appear to offer the same functionality, with the exception that the second tool can be used to process several feature datasets/feature classes at once, while the first is limited to a single feature dataset/feature class. We'll use the second tool.

In the Geoprocessing pane, open the Analyze Datasets tool.
For the Input Database Connection, browse to Databases > census_egdb.sde and click OK. Note that tables can only be analyzed by their owner.
The Datasets to Analyze box should fill with a list of datasets (in this case all feature classes) owned by the user associated with the connection.

Click Select All.

At the bottom of the dialog are checkboxes that control which tables associated with the selected datasets should be analyzed (base, delta and/or archive). The base table (sometimes referred to as the business table) is essentially what you see when you open the feature class's attribute table. The delta table stores changes made to the base data in a versioned feature class, while the archive table stores data enabling database users to retrieve historical states of a feature class. We'll look at these topics in the next lesson. For now, you can just leave all three boxes checked. No harm is done if the feature classes don't have delta or archive tables.
Click Run to execute the tool.

After a couple of seconds, the process will be complete. Close the Analyze Datasets processing window.

Keep in mind that running Analyze may have no perceptible effect for small datasets like we're dealing with here, but might result in significant performance gains in larger datasets.

B. Attribute indexes

Attribute indexes are another mechanism used in relational databases to improve performance, particularly in the execution of queries. Developing better indexing algorithms is one of the more popular research topics in the computer science field. A comprehensive review of indexing schemes is outside the scope of this course. But at the very least, you should understand that one of the more common schemes works much like the index of a book.

If you're looking for discussion of a particular topic in a book, you don't skim through each page of the book beginning with page one. You look up the topic in the index, which tells you the pages where you can conduct a much narrower search for your topic. A database index often works in much the same way. Given a WHERE clause like "WHERE city = 'Philadelphia'", the index helps the DBMS begin its search at a particular row of the table rather than at row one.

Some points to keep in mind regarding indexes:

They can be based on one or more columns.
They must be created by the table owner.
The degree to which an index will help performance depends on the degree of uniqueness in the values being indexed. Highly unique column content will benefit most from an index, less unique column content will benefit less.
They require a greater amount of disk space since they are essentially alternate representations of the data that must be stored and consulted by the DBMS.
They can increase the processing time required for table edits since the DBMS needs to not only perform the edit but also update the index. For this reason, Esri recommends dropping indexes prior to performing bulk edits, then re-creating the indexes after the edits are complete.

To see how attribute indexes are built in ArcGIS, let's create one on the name column in the us_cities feature class.

In ArcGIS Pro's Catalog pane, expand the usa_L7 feature dataset.
Right-click on the us_cities feature class, and select Properties.
(Recall that the import process may have added a “_1” to the feature class name.)
In the Feature Class Properties dialog, click on the Indexes tab. Note that an index already exists on the OBJECTID field.
In the Attribute Indexes section of the dialog, click the Add button.
In the Fields available list, of the Add Attribute Index dialog, highlight the NAME column, and click the right arrow to copy it over to the Fields Selected list.

Leave the Unique checkbox unchecked. Checking this box specifies that the database can stop searching after the first match is found. Thus, you'd only want to check this box if each value in the index column appears only once. That would be a bad idea in this case, since some of the city names are duplicated.

But, do check the Ascending box. This will create an index in which the city names are sorted in ascending order.

Assign a Name of us_cities_name_idx.

Click OK to create the index.
Click OK again to dismiss the Properties dialog.

I won't bother to have you do a test query before and after because I doubt we'd see much difference in performance with such a small table. Just keep this capability in mind if you find that your queries are taking a long time to execute.

C. Spatial indexes

While attribute indexes improve the performance of attribute queries, spatial indexes are used to improve the performance of spatial queries. Esri geodatabases support three different methods of spatial indexing, grid, R-tree, and B-tree. The grid method is analogous to the map index found in road atlases. A grid of equal-sized cells is laid over the feature class, and each row and column of the grid is assigned an identifier. Geometries in the feature class are compared to this grid and a list of grid cells intersected by each geometry is produced. These geometry-grid cell intersections are stored in a table. In the example below, feature 101 intersects three grid cells, while feature 102 is completely within a single cell.

Grid Cell Index
FID	GX	GY
101	5	9
101	5	10
101	6	9
102	4	8

Index tables like this are used to enable GIS software to answer spatial questions without having to look at each geometry in the feature class. For example, imagine selecting features from our us_cities feature class that are within the state of Pennsylvania. The software will first look up the grid cells intersected by Pennsylvania. It can then throw out all of the us_cities points that don't intersect those same grid cells. It only needs to test for containment on points that share grid cells with the Pennsylvania polygon. This testing of only the close features is much more efficient than testing all features.

It is possible to define up to three of these spatial grids per feature class. Multiple grids with different resolutions can capture the extent of features more efficiently especially when the feature class contains features that vary greatly in their extent (i.e., some small features and some large).

The grid method is employed by Esri file geodatabases and Oracle-based ArcSDE geodatabases that store geometry using the Esri ST_Geometry type. ArcGIS calculates a default grid that typically provides a high level of performance. This page in the Esri documentation (An overview of spatial indexes in the geodatabase) [4] provides further information on spatial indexes, including when you might want to rebuild one.

SQL Server geodatabase adminstrators have two options available for storing geometries: the Microsoft geometry and Microsoft geography data types, which are similar in concept to the geometry and geography spatial data types we saw in PostGIS. The default storage method when using SQL Server is Microsoft geometry. (More on how spatial indexing works for geometry and geography types can be found below.) This can be changed when creating a feature class by selecting Use configuration keyword on the last panel of the New Feature Class wizard. For example, if you have data covering a large spatial extent and want to use SQL Server's spatial functions to calculate spherical lengths and areas on the SQL command line, then storing the data using the geography type might be the way to go. Further information on these storage options can be found in the documentation (Configuration keywords for enterprise geodatabases) [5].

Another spatial indexing method employed in ArcGIS is the R-tree, which uses a set of irregularly sized rectangles (R stands for rectangle) to group together nearby objects. This (File: R-tree.svg) figure [6] helps to illustrate how an R-tree works. The red rectangles (labeled R8-R19) are the bounding boxes around some set of features (lines or polygons). The blue rectangles (R3-R7) are an aggregation of those features into groups, and the black rectangles (R1-R2) are a higher level of aggregation.

The basic idea of a search is the same, if the search geometry falls within R1 then the software knows it can disregard the features within the bounding boxes R15-R19 and instead focus on R8-R14. After that first check is completed, the blue level of the tree might be used to further narrow the search.

R-tree indexes are used in Oracle geodatabases that utilize the SDO_Geometry type. They are automatically created and managed by ArcGIS and while it is possible to delete and re-create an R-tree index, it's not clear that doing so would improve performance. If you're having performance issues in a geodatabase that uses R-tree indexing, you may want to dig further into the documentation and/or contact Esri customer support.

SQL Server-based geodatabases that implement the geometry or geography spatial type are spatially indexed using a B-tree method [7]. (As noted above, the geometry spatial type is the default in SQL Server.) This is an indexing method commonly used for non-spatial data, but in this context modified by Microsoft to handle spatial indexing as well. Like the R-tree method, this modified B-tree method employs a rectangular grid for locating features.

Finally, Postgres-based geodatabases are spatially indexed using a Generalized Search Tree (GiST) approach [8]. This indexing method was developed as an alternative to the older B-tree and R-tree methods for irregular data structures (such as GIS data). It realizes performance gains by breaking data up into groupings, like objects that are within, objects that overlap, objects to one side, etc.

Now that you've learned about some of the settings to consider when loading data into an enterprise geodatabase, let's look in SQL Server Management Studio to see how feature classes are stored. Reminder: remember to Stop your Instances when you finish or when you take a long break.

Looking Under the Hood of a SQL Server Geodatabase

A. Feature class storage in SQL Server

Let's take a look at how the data we've been working with in ArcGIS Pro is stored in SQL Server.

If you need to, log in to your enterprise geodatabase instance using Windows Remote Desktop.
If you are still in the remote connection used for the earlier parts of the lesson, close ArcGIS Pro and open SQL Server Management Studio.
Connect to the localhost server, and browse inside the egdb database (Databases > egdb).
Expand the Tables folder, and note that all of the tables we've worked with in the last couple of lessons are found here (along with many others, some of which we'll discuss momentarily).
Open the jb.CITIES table (right-click > Select Top 1000 Rows). Under the Results tab, you should see all of the attribute data along with a SHAPE column. Keep in mind that the values in the shape column are in Microsoft geometry format. You could work with your data using raw SQL much like we did with PostGIS by taking advantage of SQL Server's spatial data tools [9] [https://docs.microsoft.com/en-us/sql/relational-databases/spatial/spatial-data-sql-server].
Close the CITIES table.
But in the Object Explorer pane, keep the list of Tables expanded.

B. Repository tables

ArcSDE relies on a number of tables behind the scenes. Many of these so-called repository tables are owned by the dbo superuser.

Looking at the Tables listing, you should see a few "GDB_*" tables and many "SDE_*" tables. It's not important for you to know the purpose of all of these tables, but there are a few that are worth discussing.

The SDE_layers table stores information on all of the geodatabase's feature classes, which can be displayed as layers.
Open the dbo.SDE_layers table. The first column (layer_id) stores a unique ID for each feature class.
Close the SDE_layers table.

You may notice a number of "i" tables. These tables help ArcGIS keep track of the next unique identifier available in each table. There is one "i" table for each feature class. The relationship between the "i" tables and their associated feature classes can be found in SDE_table_registry.
Open the SDE_table_registry table. The registration_id value is the linkage between the feature class and its "i" table. You might open up one of the "i" tables and note that the base_id indicates the next ID to be used in the event a new feature is added.
Close the SDE_table_registry table.

Finally, it's worth pointing out that the "GDB_*" tables are where geodatabase behavior information is stored. For example, you can see the relationship between feature classes and their parent feature datasets in these tables.
Open the GDB_items table. Locate the row with a Name value of egdb.CENSUS.usa_L7, and make note of its UUID value. On my instance, its UUID is "{D7034507-7188-467F-BFB1-41F1F3FE2F3D}", but it will be different on your instance.
Now, open the GDB_itemrelationships table. You should see that same value appear three times in the OriginID column.
The DestID values for those rows correspond to the UUID values of the egdb.CENSUS.us_cities, egdb.CENSUS.states and egdb.CENSUS.counties feature classes found in the GDB_items table.

It's not really important that you remember much about these repository tables. However, hopefully, you now have a bit of an appreciation for what's going on behind the scenes and will see the tables as a bit less of a mystery.

Always remember to Stop your Instance when you finish or when you take a long break.

Raster Management

Esri offers a number of different options for managing raster data in an enterprise geodatabase. One area that receives a good deal of attention is managing collections of adjacent raster data sets (e.g., aerial photos). The options for dealing with such collections range from working with the raster data sets individually to merging them together into one large data set. In between is something Esri calls a mosaic dataset which attempts to provide the best of both worlds, the ability to work with multiple raster data sets as one combined layer or to break them up into their individual components. We'll talk about a couple of these approaches in this section of the lesson.

A. Raster datasets

Let's see how to bring a raster data set into a geodatabase.

Download this zipped raster image of the earth at night [10].
Open a Windows Remote Desktop Connection to your enterprise geodatabase instance if you haven't already.
Copy the earthatnight.zip file to the C:\data folder on your remote instance and unzip it.
Open ArcGIS Pro and open the Geoprocessing pane.
Open the Raster to Geodatabase tool (Conversion Tools > To Geodatabase).
For the top Input Rasters parameter, browse to your C:\data folder and select your earthatnight.bil raster.
For the Output Geodatabase parameter, browse to one of your data loading users (e.g., dbo_egdb.sde).
Click Run to begin the import.
After the import has completed, refresh the connection's listing if necessary, and add the raster to a map to confirm that it imported properly.

Now, let's take a moment to look at how the imported earthatnight raster looks from within SQL Server.
Close ArcMap and open SQL Server Management Studio (SSMS).
In SSMS, browse to Databases > egdb > Tables.
Open the EARTHATNIGHT table, and note that it contains just one row. This table doesn't store the raster data itself, but rather the footprint of the raster (a polygon representing the area that it covers). Other information associated with the raster can be found in the following supporting tables:
- SDE_blk_1: pixel data
- SDE_bnd_1: metadata on the raster's bands
- SDE_aux_1: statistics (similar to those used to improve vector dataset performance) and color maps
The "_1" part of these table names comes from the rastercolumn_id value assigned to the raster, found in the SDE_raster_columns repository table. If the earthatnight raster instead had a rastercolumn_id of 2, its pixel data would be stored in SDE_blk_2, for example.

Raster datasets can hold either integer or floating-point data. They can also be comprised of multiple bands. If you have a single-band integer raster dataset, a value attribute table (VAT) can be built that stores the number of cells associated with each integer value. The earthatnight raster holds integer values, but it is comprised of three bands. If it were a single-band integer raster dataset, we would see an SDE_vat_1 table in addition to the other tables.

B. Mosaic datasets

As mentioned at the beginning of this section, it is common in GIS to deal with collections of adjacent raster datasets. Esri's mosaic dataset can be used to treat such collections as a single unit while still having the ability to work with the individual files as needed. Let's create a mosaic dataset to manage some elevation raster data sets for the State College, PA, area (found in the doqs folder from the Lesson7_data download).

First, you need to upload the DOQ data mentioned above from your machine to the C:/data folder on your instance.

On your local machine, unzip the Lesson7_data.zip archive that you downloaded at the beginning of the lesson.

Open a Remote Desktop Connection to your instance if you need to. Then, as you did for the earthatnight zip file above, Copy and Paste the Lesson7_data\doqs folder from your computer to the C:\data folder of your instance. This will take a few minutes.
Next, in your instance, start a new ArcGIS Pro project. For simplicity's sake, we'll just continue working as the dbo user.
In the Catalog pane, activate the dbo_egdb connection, right-click on it, and select New > Mosaic Dataset.

Confirm that the Output Location is set to your dbo_egdb database connection.

Set the Mosaic Dataset Name to state_college_mosaic.

Open the Coordinate System dialog, and Import the coordinate system definition from one of the three images in the doqs folder.

Leave the Product Definition parameter set to None since we're dealing with generic black-and-white orthophotos rather than imagery coming from a particular product. But take a moment to look over the options for this parameter, especially if you commonly work with imagery.

Click Run to create the mosaic dataset. Once it has been created, ArcMap will add it to the data frame as a group layer. We haven't added images to the mosaic dataset yet, so the group layer has nothing to show at this point.

Before adding images to the mosaic dataset, you should first build pyramids and calculate statistics for the images. Pyramids are simply lower-resolution versions of raster datasets that improve drawing performance at smaller scales. Without pyramids, a raster will be drawn at its highest resolution at all scales; users will wait longer than necessary for the drawing of image detail that can't even be appreciated. An important point to remember about pyramids is that they're used purely for display purposes. When analysis is conducted using the raster as an input, its normal resolution will be used.
In the Geoprocessing pane, open the Build Pyramids And Statistics tool (found under Data Management Tools > Raster > Raster Properties).

For the Input Data or Workspace parameter, browse to the doqs folder, select it, and click OK.

Before building pyramids for a raster, you should give some thought to two components of the process – the resampling and compression methods. ArcGIS provides three resampling methods:
- nearest neighbor
- bilinear interpolation
- cubic convolution
The nearest neighbor method is best for discrete raster datasets (like a land use or soils grid) and for scanned maps. Bilinear interpolation and cubic convolution are better suited for continuous raster datasets (like an elevation grid) and for satellite imagery and aerial photography. Bilinear interpolation is faster at pyramid creation time than cubic convolution, but on the flip side, cubic convolution typically produces the most visually pleasing output.

The pixel data created by the pyramid building process can be compressed to reduce storage requirements and improve performance. Higher levels of compression can be achieved with different methods, though care should be taken to match the data's use to an appropriate compression method.

The LZ77 method is referred to as a loss-less compression method because it results in no degradation of the input data. It should be used when the highest accuracy possible is required. The other method, JPEG, can produce a significantly higher level of compression, though at the expense of some degradation of the input data. Thus, it is referred to as a lossy compression method. The JPEG method can be used in situations when the highest level of spatial accuracy is not really necessary.

Click the Environments button, then click on the Raster Storage heading to access the resampling and compression options. Since we're dealing with aerial photography, let's use the Bilinear Interpolation resampling option; select Bilinear from the Resampling technique list.
A bit of degradation in the raster quality in the pyramid data is acceptable, so let's go with the JPEG compression option.

Choose JPEG from the Compression type dropdown list, and accept the default Quality value of 75.

In the Raster Statistics part of the dialog, you'll see a couple of "skip factor" options, one for the x dimension and one for the y. These values specify how many rows/columns to skip when computing statistics on the raster. The default skip factor value is 1 for each dimension, which means that cell values are retrieved for every other row and every other column. This decreases the time required to calculate the statistics, though it also decreases the accuracy of the statistics. In most cases, a skip value of 1 should produce statistics that are "good enough".

The Statistics ignore value is a value or list of values that should not be included in the statistics calculations. For example, if you used a value like -9999 for missing data, you would want to specify that as the ignore value to avoid generating distorted statistics.

Accept all of the defaults in the Raster Statistics part of the dialog. Click OK to close the Environment Settings window.
Click Run to build pyramids and statistics for each of the orthophoto rasters. This will take a few minutes - note the ticker-tape at the bottom of the window. Once the pyramid and statistics building process is complete, you're ready to add the raster data sets to the mosaic dataset.
In the Geoprocessing pane, open the Add Rasters to Mosaic Dataset tool, found under Data Management Tools > Raster > Mosaic Dataset.

For the Mosaic Dataset parameter, select egdb.DBO.state_college_mosaic from the dropdown list.

Leave the Raster Type and Processing Templates parameters set to their default values of Raster Dataset and Default, respectively.

Set the Input Data parameter to Folder because you're going to add all images from a folder.

Click the browse icon button just beneath the Input Data dropdown menu, navigate to your doqs folder, select it and click OK.

Click Run to proceed with adding the rasters to the mosaic dataset (state_college_mosaic).
When Pro has finished adding the images, you should see them displayed as a single unit (a group layer) which can be turned on/off by checking its box in the Display pane. You may need to refresh the view by using the Full Extent tool or by right-clicking on the group layer and choosing Zoom to Layer. Note that the Boundary layer shows the outline of the mosaicked data, while the Footprint layer shows the area covered by the individual input rasters.

After creating a mosaic dataset, it is good practice to run an Analyze on it as you would do with a new vector dataset.
In the Geoprocessing pane, open the Analyze Mosaic Dataset tool (found at the same Data Management Tools > Raster > Mosaic Dataset path), select the state_college_mosaic dataset, and click Run.

Now, let's have a look at the dataset's attribute table.
Right-click on the state_college_mosaic group layer in the Display pane and select Open Table > Attribute Table. This opens the Footprint layer attribute table.
Note that the table's Name field includes references to the original raster datasets. An important point to remember about mosaic datasets is that the input rasters remain in their original location. They are not transferred or stored in the database unless they had started out that way (e.g., like the earthatnight raster).
In the Display pane, confirm that the Boundary layer is toggled off and the Footprint and Image layers are toggled on.

As mentioned above, a mosaic dataset combines raster datasets together but still allows for working with the original individual raster datasets. We'll wrap up this section by displaying just two of the original images.
Activate the Select tool, and click-drag a rectangle that touches on two of the DOQ images.

Note what is selected. The actual image pixels are not selectable. Go ahead and turn off the visibility of the Image layer if you want.

Look in the open Footprint table at what records have been selected. It should be two of the named DOQs.
Now, right-click the state_college_mosaic layer in the Display pane, and click Selection > Add to Map.

In the resulting dialog, enter My Rasters for the Group layer name and confirm that Name is selected from the Layer name based on dropdown list. This simply specifies that your raster(s) will be labeled based on the value in the Name field in the mosaic dataset's attribute table.

Click OK to add your raster(s) to the map.
Turn off the state_college_mosaic group layer visibility, and turn on the new My Rasters group layer (the boxes for each raster need to be checked).

Always remember to Stop your Instance when you finish or when you take a long break.

Project 7: Mapping Charlottesville

A. Project Overview

For Project 7, you are going to revisit the historical maps of Charlottesville, VA, that you may have worked with at the end of GEOG 484. In the data download for Lesson 7, you were given a folder containing 4 scanned maps of Charlottesville, circa 1920. You were also given a shapefile of buildings digitized from those scanned maps. Your task for this project is to:

Create a new cville user who will serve as the owner/loader of the project data.
Create a new cville_editor role and assign editing privileges for the project data to that role. Add one of the animated character users from the previous lesson to the new role.
Create a feature dataset to hold the vector data.
Import the buildings shapefile into the feature dataset.
Create a mosaic dataset to manage the scanned maps as a single unit.
Create a new streets feature class within the feature dataset.
Use ArcGIS Pro's editing tools to digitize street centerlines into the streets feature class (working as the animated character in the cville_editor role).

For this step, you will not be evaluated on the quality of your digitizing. The focus of this project is on the database management aspects, not the editing.

B. Deliverables

This project is one week in length. Please refer to the Canvas Calendar for the due date.

Submit a write-up that includes a summary of the steps you undertook during this project. Include in your write-up a description of the fields defined in your streets feature class, a map of the buildings and streets captured from the scanned Sanborn maps and a screen capture showing the database items related to this project. Taking a screen capture of the Catalog pane connection made through a user in the cville_editor role is what I have in mind here. Your submission will be graded as follows:
- Quality of workflow: 70 of 100 points
- Quality of write-up, in terms of grammar, spelling, flow: 30 of 100 points
Complete the Lesson 7 quiz.