Time to experiment with Amazon EC2 and S3. This site http://www.gis-xaml.com is using an Amazon EC2 instance with a complete open source GIS stack running on Ubuntu Gutsy.
- Ubuntu Gutsy
- Java 1.6.0
- Tomcat 6.016
- MySQL 5.0.45
- PostgreSQL 8.2.6
- PostGIS 1.2.1
- GEOS 2.2.3-CAPI-1.1.1
- Proj 4.5.0
- GeoServer 1.6
Running an Apache2 service with a jk_mod connector to tomcat lets me run the examples of xaml xbap files with their associated java servlet utilities for pulling up GetCapabilities trees on various OWS services. This is an interesting example of combining open source and WPF. In the NasaNeo example Java is used to create the 3D terrain models from JPL srtm (Ctrl+click) and drape with BMNG all served as WPF xaml to take advantage of native client bindings. NasaNeo example
I originally attempted to start with a public ami based on fedora core 6. I found loading the stack difficult with hard to find RPMs and difficult installation issues. I finally ran into a wall with the PostgreSQL/PostGIS install. In order to load I needed a complete gcc make package to compile from sources. It did not seem worth the trouble. At that point I switched to an Ubuntu 7.10 Gutsy ami.
Ubuntu based on debian is somewhat different in its directory layout from the fedora base. However, Ubuntu apt-get was much better maintained than the fedora core yum installs. This may be due to using the older fedora 6 rather than a fedora 8 or 9, but there did not appear to be any useable public ami images available on the AWS EC2 for the newer fedoras. In contrast to fedora on Ubuntu installing a recent version of PostgreSQL/PostGIS was a simple matter:
apt-get install postgresql-8.2-postgis postgis
In this case I was using the basic small 32 bit instance ami with 1.7Gb memory and 160Gb storage at $0.10/hour. The performance was very comparable to some dedicated servers we are running, perhaps even a bit better since the Ubuntu service is setup using an Apache2 jk_mod to tomcat while the dedicated servers simply use tomcat.
There are some issues to watch for on the small ami instances. The storage is 160Gb but the partition allots just 10Gb to root and the balance to a /mnt point. This means the default installations of mysql and postgresql will have data directories on the smaller 10Gb partition. Amazon has done this to limit ec2-bundle-vol to a 10GB max. ec2-bundle-volume is used to store an image to S3 which is where the whole utility computing gets interesting.
Once an ami stack has been installed it is bundled and stored on S3, that ami is then registered with AWS. Now you have the ability to replicate the image on as many instances as desired. This allows very fast scaling or failover with minimal effort. The only caveat of course is in dynamic data. Unless provision is made to replicate mysql and postgresql data to multiple instances or S3, any changes can be lost with the loss of an instance. This does not appear to occur terribly often but then again the AWS is still Beta. Also important to note, the DNS domain pointed to an existing instance will also be lost with the loss of your instance. Bringing up a new instance requires a change to the DNS entry as well (several hours), since each instance creates its own unique amazon domain name. There appear to be some work arounds for this requiring more extensive knowledge of DNS servers.
In my case the data sources are fairly static. I ended up changing the datadir pointers to /mnt locations. Since these are not bundled in the volume creation, I handled them separately. Once the data required was loaded I ran a tar on the /mnt/directory and copied the .tar files each to its own S3 bucket. The files are quite large so this is not a nice way to treat backups of dynamic data resources.
Next week I have a chance to experiment with a more comprehensive solution from Elastra. Their beta version promises to solve these issues by wrapping Postgresql/postgis on the ec2 instance with a layer that uses S3 as the actual datadir. I am curious how this is done but assume for performance the indices remain local to an instance while the data resides on S3. I will be interested to see what performance is possible with this product.
Another interesting area to explore is Amazon’s recently introduced SimpleDB. This is not a standard sql database but a type of hierarchical object stack over on S3 that can be queried from EC2 instances. This is geared toward non typed text storage which is fairly common in website building. It will be interesting to adapt this to geospatial data to see what can be done. One idea is to store bounding box attributes in the SimpleDB and create some type of JTS tool for indexing on the ec2 instance. The local spatial index would handle the lookup which is then fed to the SimpleDB query tools for retrieving data. I imagine the biggest bottleneck in this scenario would be the cost of text conversion to double and its inverse.
Utility computing has an exciting future in the geospatial realm – thank you Amazon and Zen.