Teaching enterprise geodatabase concepts online has historically been a challenge because it's been difficult to provide students with a "playground" to get hands-on experience. Fortunately, today it is much easier to give students hands-on experience with enterprise geodatabases through virtual machines hosted by Amazon. Over the next three weeks, you will use this Amazon cloud-based solution to set up and experiment with your own remote enterprise geodatabase server.
Objectives
At the successful completion of this lesson, students should be able to:
If you have any questions now or at any point during this week, please feel free to post them to the Lesson 6 Discussion Forum.
Lesson 6 is one week in length. See the Canvas Calendar for specific due dates. To finish this lesson, you must complete the activities listed below. You may find it useful to print this page out first so that you can follow along with the directions.
Using the Amazon service mentioned above, Esri makes it possible to set up cloud-based installations of ArcGIS Enterprise, software required for implementing an enterprise geodatabase. Because our Cloud and Server GIS course also guides students through putting together an instance of ArcGIS Enterprise, the instructions below are borrowed from that course. While our focus in this course will be on the enterprise geodatabase that we'll implement on the instance with ArcGIS Enterprise, a side benefit is that you'll also be able to experiment with ArcGIS Enterprise functionality if you like.
Go to Amazon Web Services [1] and click on "Create an AWS Account."
If prompted to create a Business or Personal account, choose Personal. Please make careful note of the password you select when setting up your account, you will be needing it. It is characteristic of Amazon Web Services that things work the way they are supposed to, but you don't get a lot of hand-holding. So, if you lose your password, I'm not sure it would be easy to recover it, and you will need to fill out a form with personal information.
Second, you will need to provide payment information, including a credit card number. If you are careful and follow the course instructions about explicitly stopping your instance (virtual machine) when you are not using it, you should be able to complete the coursework while incurring charges of $20-$40. The current step of signing up is free, but you should be aware that you will start being charged immediately upon starting to use AWS services. As part of this step, you'll be asked to select a support plan. The Basic (Free) support is all you need for this class.
Third, there is an identity confirmation step during which you will receive a phone call and enter a code. This ensures you are a human. Amazon does not use the phone number provided here for other purposes.
You can monitor your billing status by clicking your user name at the top-right of the AWS screen and choosing "My Billing Dashboard." On the right side of the billing dashboard, you should see a "Month-to-Date Spend by Service" section, from which you can view details about what you're being charged for in the current month or any other month.
If any of these directions are confusing or inaccurate, please post a question or comment to the Lesson 6 discussion forum.
In a few moments, we'll see that two of the settings involved in launching a new instance in EC2 are the VPC and key pair. A VPC (Virtual Private Cloud) is sort of your own special space carved out of Amazon's cloud. Instances in a VPC can see each other and your own network fairly easily, but they're not immediately accessible from elsewhere without some extra work on your part. That's a good thing for security.
A key pair is another security measure that will come into play when you log in to your instance for the first time. You will be logging in to your instance as a user named Administrator. The password for the Administrator user will be encrypted by AWS. The procedure for getting that password so that you can log in to your instance involves 1) creating a key pair (one key held by Amazon and another key given to you) in the AWS Management Console, 2) providing the name of that key pair when launching the instance, and 3) using the key pair after the instance has been created to decrypt the password.
Note: If you've taken our Cloud and Server GIS course (GEOG 865), you will have already created a VPC and key pair. You may skip over the steps in this section (or do them again if you like) and pick up with section C below.
Esri provides two ways to deploy ArcGIS in AWS: using Amazon's CloudFormation service and Amazon's AWS Management Console. For our purposes, the AWS Management Console is the best option, so we will lead you through the launching of an instance via that route. However, if you decide to deploy ArcGIS in the cloud as part of your job, you may want to explore the CloudFormation option as well. Instructions for both can be found in Esri's documentation [4] [http://server.arcgis.com/en/server/latest/cloud/amazon/use-aws-management-console-with-arcgis-server.htm]. Note that this link opens the Management Console instructions; instructions for the CloudFormation method can be found through the navigation headings on the left side of the page.
The basic idea behind what we're about to do is that Amazon has made it possible for vendors like Esri to create machine images (configurations of operating system, software, data, etc.) that can serve as blueprints for the making of child instances. Esri has created several of these AMIs (Amazon Machine Images): one that runs ArcGIS Enterprise on the Linux OS Ubuntu with Postgres, one that runs ArcGIS Enterprise on Windows with SQL Server, etc. Third parties (like us) can discover and "subscribe" to these AMIs through the AWS Marketplace.
There were a lot of steps involved in launching this instance and some of it may have seemed confusing, but the good news is you should not need to go through these steps again. Stopping and re-starting your instance now that it's been created is a much simpler process.
Whether starting or stopping your instance, you'll want to be viewing the Instances page in the AWS Management Console [8].
When you've finished your coursework for the day, you can Stop your instance as follows:
When you want to Start your instance:
Note: The biggest benefit to having a consistent IP address associated with an ArcGIS Enterprise/Server instance comes when you're using it to develop map/feature/geoprocessing services to be consumed by an app or a third party. A constantly changing IP address would render such services practically unusable. We won't be authoring such services in this course, though you can check out our Cloud & Server GIS course [9] if you're interested in learning how to do that. In this course, the benefit to the Elastic IP will be avoiding the need to locate the instance's new Public DNS address each time you want to re-start it and connect to it through remote desktop. If you'd rather not bother with associating an Elastic IP with your instance, then you're welcome to skip that step and instead look up the instance's new Public DNS and connect through that address.
If you just completed section B above, then your Enterprise Geodatabase instance is currently running, and we're going to work with it in the next section. If you are going to continue on, you don't need to stop your instance now. But be sure to stop it when you're ready to quit working.
Now that your site has been created, you can get ready to log in to the instance and start working with your software.
Your instance needs to be running, so if you did Stop it at the end of the previous section, open the AWS Management Console and Start it again.
Recall that when launching the instance, you created a new Security Group using the default Create new based on seller settings option. This set up the instance so that it would accept http and https connections from any IP address. We now want to log in to the instance using the Windows Remote Desktop Connection app, but to do that, we need to add a rule to the security group allowing that sort of traffic.
Next, choose My IP from the Source dropdown list and click Save rules.
Important: You've just specified that your instance should accept remote desktop connections from your current IP address, and you'll shortly make your first remote desktop connection. It's possible that your IP address will change over the rest of the term. For example, your Internet service provider might use dynamic IP address assignment. Or you might be working on a laptop in a different location. If that's the case, then you'll need to come back and edit your RDP rule to accept connections from whatever your new IP address happens to be at that time. Alternatively, you could also choose to set the Source to Anywhere - IPv4, which would allow any IP address to attempt an RDP connection, but eliminate the need to update the RDP rule whenever your connecting-from IP address changes. For a low-stakes instance such as the one you're using in this class, you may decide it's worth the risk. (Anyone wanting access to your instance would still need to supply your password. More on that below.)
Now that the instance is ready to accept remote desktop connections, there are two bits of information we'll need to make a connection: the instance's IP address and the password of the Administrator account.
Copy and Paste your Elastic IP address into a simple text editor like Notepad.
In the Windows Security dialog, log in with the following credentials:
User name: Administrator
Password: the password you decrypted in the AWS Management Console
Click OK.
You'll probably receive a warning that "the identity of the remote computer cannot be verified." Go ahead and answer Yes, that you want to connect anyway.
You should see the desktop of your remote instance open up.
As a security precaution, it's usually not a good idea to go around browsing the web from your production server machine. To do so is to invite malware intrusions onto one of your most sensitive computers. The operating system on your instance, Windows Server 2016, enforces this by blocking Internet Explorer from accessing most sites. This is called IE Enhanced Security Configuration (ESC). IE ESC gets burdensome when you're using the server solely for development or testing purposes, like we are. To smooth out the workflows in this course, you'll disable IE ESC right now and leave it off for the duration of the course.
Amazon gave you a pretty strong password for this instance, but it's not one you're liable to remember easily. You should change the administrator password to something you'll remember.
Figure 6.1: Password Security Setting Rules and Restrictions
As with the resetting of the instance password, these licensing steps need only be performed once after launching your instance.
Esri supports the implementation of enterprise geodatabases using a number of relational database management packages (e.g., Oracle, SQL Server, Postgres). To expose you to another RDBMS that's commonly used in the industry, I'm going to ask you to install SQL Server Express. (This is a free, lite version of Microsoft's SQL Server package. For everything we'll do in this class, the two SQL Server packages operate the same.)
With that, we're ready to begin playing with our enterprise geodatabases. First, you'll read a bit about ArcSDE, a technology that enables the ArcGIS products to work with data stored in an RDBMS.
ArcSDE is software that enables ArcGIS applications to store, manage and retrieve data in a RDBMS. The “Arc” comes from the ubiquitous naming convention used by Esri for their products. The SDE part stands for Spatial Database Engine.
ArcSDE is sometimes described as middleware, a layer of software that sits between Esri’s ArcGIS products and RDBMS software and manages data exchanges between them. It makes it possible for GIS data users to utilize their data without need for special knowledge of the underlying RDBMS.
As of ArcGIS 10, ArcSDE supports the commercial RDBMS’s Oracle, SQL Server, DB2, and Informix, and the free and open-source PostgreSQL.
In the last lesson, you worked with file geodatabases, a format that Esri recommends for small, single-user projects. Esri often refers to geodatabases stored in an RDBMS and accessed using ArcSDE as multiuser geodatabases because they are better able to support access by more than one user. The advantages of a multiuser geodatabase include:
Esri enables users in the Amazon cloud to run ArcGIS Enterprise on either the Ubuntu operating system or Windows. While there would be some benefit to seeing how Postgres operates as an Esri enterprise geodatabase, we're going to proceed with SQL Server on Windows, as SQL Server is more commonly used in the industry as a geodatabase DBMS. That said, the concepts involved in administering an enterprise geodatabase are similar regardless of the RDBMS used, so what you learn here will be transferrable to other RDBMS’s.
One of the first things you might want to do after launching an enterprise geodatabase is set up login roles and privileges for those roles. That will be the focus of the next section of the lesson.
Before diving into enterprise geodatabase concepts, it's important to understand some basics of working with SQL Server, the DBMS that we'll be using on our Amazon cloud instances. (While we're actually using SQL Server Express, I will refer to the software as simply SQL Server, since the two operate essentially the same for our purposes in this class.) Access to SQL Server itself is granted through logins. A login allows someone to authenticate with SQL Server and answers the question, "Who is connecting?" Access to an individual database (of which there could be several for a given SQL Server instance) is granted through users. Whereas logins are concerned with authentication, user accounts provide authorization to perform different tasks with a database. They answer the question, "What can this person do in the database?" As we'll see, logins must be mapped to database user accounts in order to work with database objects. This blog post [10] provides further information.
Related to logins and users is the concept of roles. A role provides a way to group similar users together so that permissions can be granted more easily. Database administrators can define their own desired roles and make permission changes for those roles as they see fit. SQL Server also comes with a set of fixed server and database roles whose permissions cannot be changed. Among these is the sysadmin role, which, as you might guess, has permissions that grant full control over the server.
All databases found on a SQL Server instance have a special user called DBO (short for database owner). Any member of the sysadmin fixed server role who uses a database is mapped to the DBO user, and any object created by any member of the sysadmin fixed server role belongs to DBO automatically.
An important consideration when implementing an enterprise geodatabase in SQL Server is who will be the owner of the geodatabase -- the DBO user or the SDE user. While the DBO user is the person who administers the database at the SQL Server level (e.g., creating new users and roles), the SDE user is the person who administers the database within ArcGIS (e.g., updating database statistics and compressing the database). Esri's documentation includes a page that discusses the pros and cons of each [11] [https://pro.arcgis.com/en/pro-app/help/data/geodatabases/manage-sql-server/comparison-geodatabase-owners-sqlserver.htm]. Generally speaking, if the SQL Server database administrator and the geodatabase administrator are the same person, then having the DBO user own the geodatabase is sensible. If, on the other hand, those two roles are filled by different people, then having the SDE user own the geodatabase is probably more advisable. In this course, we're going to work with a DBO-owned geodatabase.
In the next section, you will be back working, via your Remote Desktop Connection, on your instance of the Amazon Machine Image (AMI) that holds our enterprise geodatabase, etc.
If you STOP-ed your instance on purpose, or if your Remote Desktop Connection gets interrupted, you will need to re-START your instance via the EC2 Console.
And, if the IP address of the machine you're connecting with has changed since your last connection, remember that you may need to edit the RDP rule to allow connection from your new IP address.
Before we move on to setting up geodatabase users and roles, there are a couple of authentication settings we will change to make our lives a bit easier in this safe learning environment.
The first will be to set up SQL Server to allow mixed-mode authentication. By default, SQL Server only allows authentication through operating system logins. However, it is also possible to authenticate via user names and passwords stored in the database. In a real-world implementation, you may find it advantageous to utilize Windows authentication. But since that involves a bit of extra work on the instances we've just created, we'll go with SQL Server authentication.
The second setting we'll make will be to relax the default password standards on the Windows Server operating system that filter down to SQL Server. This will allow us to avoid the need to create lengthy, complex passwords.
We'll now be able to authenticate using simple, easy-to-remember passwords. Obviously, in a real-world implementation, you would want to think twice before lessening your database security like this.
Esri's ArcGIS Server/Enterprise AMIs have gone back and forth, sometimes including database software and a pre-configured enterprise geodatabase, sometimes not. This version's AMI does not -- we had to install the RDBMS ourselves -- so let's now create our first database.
We'll start our work with the egdb database by connecting to it as the DBO user.
Before adding some data to the geodatabase, let's discuss the importance of users and roles in a geodatabase context and lay out the scenario for the rest of our work in this section.
Database roles enable the database administrator to improve security by assigning users or groups of users various levels of access to the database tables. Commonly used roles in an enterprise geodatabase include viewer, editor, etc. As the names of these roles imply, one user group might be limited to read-only access to data while another group might be allowed to both read and edit the data. While we're only getting to this topic now, you should keep in mind that similar strategies can also be applied to the kind of Postgres/PostGIS databases we talked about in previous lessons.
For illustration purposes, imagine you work in the U.S. federal government and that you are administering a geodatabase that will host data from a number of different departments: state boundaries from the Census Bureau, major highways from the Department of Transportation, rivers from the Department of the Interior, etc. You want each department to have stewardship over their data and to have just read-only access to other departments' data.
Click here to download to your machine the Lesson 6 data [13]. In the steps below, you will then copy said us_data.zip archive file from your local machine to your remote instance. The Downloads (or Documents) folder on your remote instance is a logical place to paste and unzip the data.
Note that in our enterprise database environment, vector data is stored as feature classes. So, later, the Shapefile datasets that I gave you will be uploaded and converted to feature classes.
One aspect of the data loading/creation process that's worth keeping in mind is that when a feature class or feature dataset is created in the geodatabase, its name in the ArcGIS applications has two parts:
<owner>.<feature class/feature dataset>
Some organizations set up users specifically to handle data loading/data creation so that the owner listed in the two-part name clearly conveys the department that maintains the data. Following on this practice, let's add a user to the database who will control the loading of the Census data.
The Create Database User tool is one of several geodatabase administration tools available through the ArcGIS Pro GUI.
With the census user created, let's work as that user to load the state and city data you downloaded above. Because feature datasets are frequently used to house feature classes, let's create one of those first.
An important point to note about the process we just went through is that users created via the Create Database User tool have the ability to load/create data. It's generally considered a best practice to restrict this ability to a small number of trusted administrators to avoid cluttering the database with unwanted data. We'll now see that following this best practice requires adding lower-permission users through a different process.
Returning to our scenario, the users in the departments are as follows:
Department | editors | viewers |
---|---|---|
Census | Moe | Larry, Curly |
Transportation | Lisa | Bart, Homer |
Interior | Stan | Cartman, Kenny |
To add these users, we'll need to return to the SQL Server Management Studio.
In order for users to have the ability to view and/or edit the data in a geodatabase, the data owner must grant those privileges. That can be done on a user basis (using an ArcGIS tool), but the process can be done more efficiently by assigning users to roles and assigning privileges to the roles. In this section, you'll create a viewer role and an editor role in the egdb database.
One important note on roles: while the privileges must be granted by the data owner (the census and transportation users in our scenario), the roles must be created by someone with sysadmin privileges. That is the DBO user in our implementation.
Finally, we'll now work through the data owner connections to assign the proper privileges to go with the roles just created.
Note:
This concludes the hands-on activities you'll be doing with your geodatabase instance in this lesson. Feel free to do some experimenting with your instance. When done, close the remote desktop connection to your instance, and most importantly, remember to stop your instance through the EC2 Console to avoid racking up unnecessary charges.
Go on to the next page to see this week's graded activity.
With that, we've finished our tutorial on creating a SQL Server geodatabase, adding users, loading some data, and assigning data privileges to the users. You weren't specifically instructed to add the Department of the Interior users and import the us_hydro shapefile, but you're welcome to do so if you're looking for additional practice.
In the Adding Users and Data to Your Geodatabase part of the lesson, it discusses the various 'roles' that people who are involved with an enterprise geodatabase might have.
For Project 6, I'd like you to spend some time reflecting on the data users in your organization. If you already utilize an enterprise geodatabase, tell me a bit about your system, who uses it, through what login roles, and with what privileges. In preparation for this write-up, you may want to talk to your geodatabase administrator if that person is not you.
If your organization isn't using an enterprise geodatabase, tell me a bit about your data workflows and users, and spend some time describing the login roles and privileges you think would best meet your organization's needs if you were to implement one.
It's not necessary for you to divulge the identities of co-workers, but you may find that your write-up flows better if you use at least first names. You can always use fictitious names if you have concerns about identifying people by their real names.
Some of the job responsibilities frequently associated with geodatabase usage that you may want to consider include:
Note: If you feel you're unable to report on a current or former workplace, I will also accept an analysis of any organization that has published information on its enterprise geodatabase implementation. Such reports can be found by doing an online search for GIS + "master plan" or "strategic plan."
This project is one week in length. Please refer to the Canvas Calendar for the due date.
Links
[1] https://aws.amazon.com/
[2] https://console.aws.amazon.com/
[3] https://console.aws.amazon.com/ec2/
[4] http://server.arcgis.com/en/server/latest/cloud/amazon/use-aws-management-console-with-arcgis-server.htm
[5] https://aws.amazon.com/marketplace/pp/prodview-rh32a6tw3ju4a?sr=0-3&ref_=beagle&applicationId=AWSMPContessa
[6] http://aws.amazon.com/ec2/pricing/on-demand/
[7] https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Subnets.html
[8] https://docs.aws.amazon.com/awsconsolehelpdocs/latest/gsg/getting-started.html
[9] https://www.e-education.psu.edu/geog865/
[10] https://blogs.msdn.microsoft.com/lcris/2007/03/23/basic-sql-server-security-concepts-logins-users-and-principals/
[11] https://pro.arcgis.com/en/pro-app/help/data/geodatabases/manage-sql-server/comparison-geodatabase-owners-sqlserver.htm
[12] https://support.esri.com/en/technical-article/000022869
[13] https://www.e-education.psu.edu/spatialdb/sites/www.e-education.psu.edu.spatialdb/files/us_data.zip
[14] https://pro.arcgis.com/en/pro-app/help/data/geodatabases/manage-sql-server/user-accounts-groups.htm