Cloud computing is, in some ways, a new term for an old thing: keeping data and computational processes in some central location (historically a mainframe) and accessing them with a less-powerful client, or, in other words, a client-server architecture. However, there is something new going on here in that we can catch some of the glory that is the Internet. Using the Internet as a transport mechanism means that client-server computing can be used in more situations, and more robustly than has previously been possible. If this topic interests you, you might consider looking into our Cloud and Server GIS course.
Cloud computing services I am currently keeping an eye on include: Amazon's Elastic Compute Cloud (EC2), Google App Engine, Microsoft Azure, and Hadoop.
Amazon's EC2 is in some ways the most straightforward proposition: you get to create server instances, either Linux or Windows. It is basically just like remoting using any other computer. But, the magic is that you can easily set up, copy, and throw away these instances, in a way that would not be practical for physical computers.
The real "ah-hah" moment for cloud computing in general, and Amazon's EC2 in particular, for me came when I saw a presentation by Karsten Vennemann (a GIS analyst and consultant who lives in Seattle) at the 2009 Washington URISA conference. Karsten had set up web mapping sites for the 2008 Obama campaign. They needed to be able to generate lots of maps of voters, but they needed to be able to set them up quickly, without spending a lot of money, and they didn't need to keep anything after the campaign was over. So, spending many thousands per machine for many machines, plus tens of thousands for software, was not an option.
Karsten used a complete Open Source stack (Linux, PostGIS, Apache Server, MapServer, OpenLayers) to eliminate license issues. We will be discussing Open Source GIS later in this course. He used Amazon's EC2 service. Basically, you can set up as many server "instances" as you like, and they don't cost much (10 cents an hour if you are just dinking around). So, he was able to provide sites for twenty battleground states, with great performance and at little cost.
ArcGIS Server on EC2
Esri has released a thing they call the ArcGIS Server Cloud Bundle. GIS blogger extraordinaire James Fee had this to say, in part:
"What is good about this? First off I’m glad to see Esri finally start to publicly address demand for ArcGIS Server on Amazon EC2. They’ve broken their traditional maintenance based approached to licensing which is something I think we are all very happy about and they’ve automated the process with an AMI (Amazon Machine Image) ready to go.
What is still lacking? While this is a step into ArcGIS as SaaS, it still requires you to go through your local Esri office. This will mean that large Esri customers will get great breaks and those who are smaller or new will pay list prices. The cloud is supposed to bring equity, but the traditional sales model of Esri plays favorites. Windows' only instance of this AMI is also problematic. The cost of a large Windows instance of EC2 is going to offset all the benefits of a 1-year license. Of course, Esri doesn’t support Fedora or CentOS, so, until they do, most are probably going to not scale up ArcGIS in Amazon.
I see nothing about Esri helping with backing up these EC2 instances and how that is going to work. These EC2 instances can crash (hello ArcGIS!) and just disappear. If that happens you lose EVERYTHING. Basically, this is a GIS infrastructure play and it is up to the user of their AMI to handle this. That said, one ArcGIS license isn’t enough to do redundancy (though maybe the terms allow this). Basically, you are paying to use a single, slow (compared to typical servers) hardware with no methods to back up your services. YIKES!"
Here is the rest of Fee's blog post. After looking through the blog entry and the comments, the consensus seems to be that Esri is warming up to Cloud Computing, but isn't jumping in with both feet (yet).
If you wish to try ArcGIS Server on Amazon's EC2 hosting yourself, we can provide you with a "learner's copy" of ArcGIS Server on Amazon's EC2 service. Using EC2 may seem a bit scary, but is actually pretty easy once you get going. There is no charge for the software; however, you must pay your own hosting fees. These are quite reasonable as long as you don't leave your instance on when you are not using it (I pay about $5 a month using this method).
You can read a good overview of Esri's requirements and options. And the video "Using ArcGIS for Server in the Amazon Cloud" provides a (long) overview of how things work between ArcGIS Server Amazon EC2.
Google App Engine
Google App Engine allows you to write programs that leverage Google's rather tremendous infrastructure. The good news is that you can store your data in BigTable, which is Google's database-like place they keep all their data, like all your e-mail, everything you buy online, every website you visit. Haha! Just kidding!
The bad news is that you have to keep your data in BigTable. So, if you wrote a program that writes text files to disk, it won't work in App Engine. Also, it only supports Python and Java currently. On the upside, accounts are free (I have one, although I only got as far as a "Hello, World" kind of thing), as long as you use not more than 500 MB of storage, and enough CPU and bandwidth for about 5 million page views a month. Which is a fair amount.
So, Google App Engine looks interesting, but the inability to use relational databases or local files means it will take a while for an ecosystem to develop around it.
Azure is Microsoft's platform for cloud computing. Microsoft has made some good moves here: they provide a RESTful API and support open data standards like Atom. At the same time, they have striven to make it comfortable for Net programmers to easily get up and going.
Considering how deeply Esri has committed to the Windows ecosystem, it's no surprise to see some recent progress toward the use of ArcGIS Server through Microsoft Windows Azure.
Apache Hadoop is an Open Source project which allows you to create your own cloud-like services. So, if you want to set up shop as a cloud computing provider, you might go out and buy a few thousand computers, and start installing Hadoop! Somewhat confusingly, you can run Hadoop on Amazon EC2 or download a VMWare Hadoop image. Hadoop is interesting to me because you can set it up on your own hardware (if you have spare machines). Why should Google, Amazon, etc., have all the fun? Hadoop to Google is kind of like Linux to Microsoft (although Hadoop is not nearly as mature as Linux). I think the future of Big Data Applications (which surely includes GIS) will increasingly depend on Hadoop or other similar clustering strategies. Vendors like Esri have started to pay attention to Hadoop, even going as far as to make some tools available for Hadoop on Github.