Visualizing Large Data Sets with Bing Maps Web Apps

Fig 1 - MapD Twitter Map 80M tweets Oct 19 - Oct 30

Visualizing large data sets with maps is an ongoing concern these days. Just ask the NSA, or note this federal vehicle tracking initiative reported at the LA Times. Or, this SPD mesh network for tracking any MAC address wandering by.

“There was of course no way of knowing whether you were being watched at any given moment. How often, or on what system, the Thought Police plugged in on any individual wire was guesswork. It was even conceivable that they watched everybody all of the time. But at any rate they could plug in your wire whenever they wanted to.”

George Orwell, 1984

On a less intrusive note, large data visualization is also of interest to anyone dealing with BI or just fascinated with massive public data sets such as twitter universe. Web maps are the way to go for public distribution and all web apps face the same set of issues when dealing with large data sets.

1. Latency of data storage queries, typically SQL
2. Latency of services for mediating queries and data between the UI and storage.
3. Latency of the internet
4. Latency of client side rendering

All web map javascript APIs have these same issues whether it’s Google, MapQuest, Nokia Here, or Bing Maps. This is a Bing Maps centric perspective on large data mapping, because Bing Maps has been the focus of my experience for the last year or two.

Web Mapping Limitation

Bing Maps Ajax v7 is Microsoft’s javascript API for web mapping applications. It offers typical Point (Pushpin), Polyline, and Polygon vector rendering in the client over three tile base map styles, Road, Aerial, and AerialWithLabels. Additional overlay extensions are also available such as traffic. Like all the major web map apis, vector advantages include client side event functions at the shape object level.

Although this is a powerful mapping API rendering performance degrades with the number of vector entities in an overlay. Zoom and pan navigation performs smoothly on a typical client up to a couple of thousand points or a few hundred complex polylines and polygons. Beyond these limits other approaches are needed for visualizing geographic data sets. This client side limit is necessarily fuzzy as there is a wide variety of client hardware out there in user land, from older desktops and mobile phones to powerful gaming rigs.

Large data Visualization Approaches

1) Tile Pyramid – The Bing Maps Ajax v7 API offers a tileLayer resource that handles overlays of tile pyramids using a quadkey nomenclature. Data resources are precompiled into sets of small images called a tile pyramid which can then be used in the client map as a slippy tile overlay. This is the same slippy tile approach used for serving base Road, Aerial, and AerialWithLabels maps, similar to all web map brands.

Fig 2 - example of quadkey png image names for a tile pyramid

Pro:· Fast performance

  • Server side latency is eliminated by pre-processing tile pyramids
  • Internet streaming is reduced to a limited set of png or jpg tile images
  • Client side rendering is reduced to a small set of images in the overlay

Con: Static data – tile pyramids are pre-processed

  • data cannot be real time
  • Permutations limited – storage and time limitations apply to queries that have large numbers of permutations
  • Storage capacity – tile pyramids require large storage resources when provided for worldwide extents and full 20 zoom level depth

2) Dynamic tiles – this is a variation of the tile pyramid that creates tiles on demand at the service layer. A common approach is to provide dynamic tile creation with SQL or file based caching. Once a tile has been requested it is then available for subsequent queries directly as an image. This allows lower levels of the tile pyramid to be populated only on demand reducing the amount of storage required.


  • Can handle larger number of query permutations
  • Server side latency is reduced by caching tile pyramid images (only the first request requires generating the image)
  • Internet streaming is reduced to a limited set of png tile images
  • Client side rendering is reduced to a small set of images in the overlay


  • Static data – dynamic data must still be refreshed in the cache
  • Tile creation performance is limited by server capability and can be a problem with public facing high usage websites.

3) Hybrid - This approach splits the zoom level depth into at least two sections. The lowest levels with the largest extent contain the majority of a data set’s features and is provided as a static tile pyramid. The higher zoom levels comprising smaller extents with fewer points can utilize the data as vectors. A variation of the hybrid approach is a middle level populated by a dynamic tile service.

Fig 3 – Hybrid architecture


  • Fast performance – although not as fast as a pure static tile pyramid it offers good performance through the entire zoom depth.
  • Allows fully event driven vectors at higher zoom levels on the bottom end of the pyramid.


  • Static data at larger extents and lower zoom levels
  • Event driven objects are only available at the bottom end of the pyramid

sample site and demo video

tile layer sample

Fig 4 - Example of a tileLayer view - point data for earthquakes and Mile Markers

Fig 5 - Example of same data at a higher zoom using vector data display

4) Heatmap
Heatmaps refer to the use of color gradient or opacity overlays to display data density. The advantage of heat maps is the data reduction in the aggregating algorithm. To determine the color/opacity of a data set at a location the data is first aggregated by either a polygon or a grid cell. The sum of the data in a given grid cell is then applied to the color gradient dot for that cell. If heatmaps are rendered client side they have good performance only up to the latency limits of service side queries, internet bandwidth, and local rendering.

Fig 6 - Example of heatmap canvas over Bing Maps rendered client side

Grid Pyramids – Server side gridding
Hybrid server side gridding offers significant performance advantages when coupled with pre-processed grid cells. One technique of gridding processes a SQL data resource into a quadkey structure. Each grid cell is identified by its unique quadkey and contains the data aggregate at that grid cell. A grid quadkey sort by length identifies all of the grid aggregates at a specific quadtree level. This allows the client to efficiently download the grid data aggregates at each zoom level and render locally on the client in an html5 canvas over the top of a Bing Maps view. Since all grid levels are precompiled, resolution of cells can be adjusted by Zoom Level.


  • Efficient display of very large data sets at wide extents
  • Can be coupled with vector displays at higher zoom levels for event driven objects

Con: gridding is pre-processed

  • real time data cannot be displayed
  • storage and time limitations apply to queries that have large numbers of permutations

Fig 7 – Grid Pyramid screen shot of UI showing opacity heatmap of Botnet infected computers

5) Thematic
Thematic maps use spatial regions such as states or zipcodes to aggregate data into color coded polygons. Data is aggregated for each region and color coded to show value. A hierarchy of polygons allows zoom levels to switch to more detailed regions at closer zooms. An example hierarchy might be Country, State, County, Sales territory, Zipcode, Census Block.


  • Large data resources are aggregated into meaningful geographic regions.
  • Analysis is often easier using color ranges for symbolizing data variation


  • Rendering client side is limited to a few hundred polygons
  • Very large data sets require pre-processing data aggregates by region

Fig 8 - thematic map displaying data aggregated over 210 DMA regions using a quantized percentile range

6) Future trends
Big Data visualization is an important topic as the web continues to generate massive amounts of data useful for analysis. There are a couple of technologies on the horizon that help visualization of very large data resources.

A. Leverage of client side GPU

Here is an example WebGL using CanvasLayer. ( only Firefox, chrome, IE11 *** Cannot be viewed in IE10 ***)

This sample shows speed of pan zoom rendering of 30,000 random points which would overwhelm typical js shape rendering. Data performance is good up to about 500,000 points per Brendan Kenny. Complex shapes need to be built up from triangle primitives. Tessellation rates for polygon generation approaches 1,000,000 triangles per 1000ms using libtess. Once tessellated the immediate mode graphic pipeline can navigate at up to 60fps. Sample code is available on github.

This performance is achieved by leveraging the client GPU. Because immediate mode graphics is a powerful animation engine, time animations can be used to uncover data patterns and anomalies as well as making some really impressive dynamic maps like this Uber sample. Unfortunately all the upstream latency remains: collecting the data from storage and sending it across the wire. Since we’re talking about larger sets of data this latency is more pronounced. Once data initialization finishes, client side performance is amazing. Just don’t go back to the server for new data very often.


  • Good client side navigation performance up to about 500,000 points


  • requires a webgl enabled browser
  • requires GPU on the client hardware
  • subject to latency issues of server query and internet streaming
  • WebGL tessellation triangle primitives make display of polylines and polygons complex

Fig 9 – test webGL 30,000 random generated points (requires WebGL enabled browser – Firefox, Chrome, IE11)

Note: IE11 added WebGL capability which is a big boost for the web. There are still some glitches, however, and gl_PointSize in shader is broken for simple points like this sample.

Fig 10 – Very interesting WebGL animations of shipping GPS tracks using WebGL Canvas –courtesy Brendan Kenny

B. Leverage of server side GPU
MapD – Todd Mostak has developed a GPU based spatial query system called MapD (Massively Parallel Database)

MapD Synopsis:
  • MapD is a new database in development at MIT, created by Todd Mostak.
  • MapD stands for “massively parallel database.”
  • The system uses graphics processing units (GPUs) to parallelize computations. Some statistical algorithms run 70 times faster compared to CPU-based systems like MapReduce.
  • A MapD server costs around $5,000 and runs on the same power as five light bulbs.
  • MapD runs at between 1.4 and 1.5 teraflops, roughly equal to the fastest supercomputer in 2000.
  • uses SQL to query data.
  • Mostak intends to take the system open source sometime in the next year.
  • Bing Test:

    Bing Test shows an example of tweet points over Bing Maps and illustrates the performance boost from the MapD query engine. Each zoom or pan results in a GetMap request to the MapD engine that queries millions of tweet point records (81 million tweets Oct 19 – Oct 30), generating a viewport png image for display over Bing Map. The server side query latency is amazing considering the population size of the data. Here are a couple of screen capture videos to give you the idea of the higher fps rates:


    Interestingly, IE and FireFox handle cache in such a way that animations up to 100fps are possible. I can set a low play interval of 10ms and the player appears to do nothing. However, 24hr x12 days = 288 images are all being downloaded in just a few seconds. Consequently the next time through the play range the images come from cache and animation is very smooth. Chrome handles local cache differently in Windows 8 and it won’t grab from cache the second time. In the demo case the sample runs at 500ms or 2fps which is kind of jumpy but at least it works in Windows 8 Chrome with an ordinary internet download speed of 8Mbps

    Demo site for MapD:


    • Server side performance up to 70x
    • Internet stream latency reduced to just the viewport image overlay
    • Client side rendering as a single image overlay is fast


    • Source code not released, and there may be proprietary license restrictions
    • Most web servers do not include GPU or GPU clusters – especially cloud instances

    Note: Amazon AWS offers GPU Clusters but not cheap.

    Cluster GPU Quadruple Extra Large 22 GiB memory, 33.5 EC2 Compute Units, 2 x NVIDIA Tesla “Fermi” M2050 GPUs, 1690 GB of local instance storage, 64-bit platform, 10 Gigabit Ethernet( $2.100 per Hour)

    NVidia Tesla M2050 – 448 CUDA Cores per GPU and up to 515 Gigaflops of double-precision peak performance in each GPU!

    Fig 11 - Demo displaying public MapD engine tweet data over Bing Maps

    C. Spatial Hadoop
    Spatial Hadoop applies the parallelism of Hadoop clusters to spatial problems using the MapReduce technique made famous by Google. In the Hadoop world a problem space is distributed across multiple CPUs or servers. Spatial Hadoop adds a nice collection of spatial objects and indices. Although Azure Hadoop supports .NET, there doesn’t seem to be a spatial Hadoop in the works for .NET projects. Apparently MapD as open source would leap frog Hadoop clusters at least for performance per dollar.

    D. In Memory database (SQL Server 2014 Hekatron in preview release) – Microsoft plans to enhance the next version of SQL Server with in-memory options. SQL server 2014 in-memory options allows high speed queries for very large data sets when deployed to high memory capacity servers.

    Current SQL Server In-Memory OLTP CTP2

    Creating Tables
    Specifying that the table is a memory-optimized table is done using the MEMORY_OPTIMIZED = ON clause. A memory-optimized table can only have columns of these supported datatypes:

    • Bit
    • All integer types: tinyint, smallint, int, bigint
    • All money types: money, smallmoney
    • All floating types: float, real
    • date/time types: datetime, smalldatetime, datetime2, date, time
    • numeric and decimal types
    • All non-LOB string types: char(n), varchar(n), nchar(n), nvarchar(n), sysname
    • Non-LOB binary types: binary(n), varbinary(n)
    • Uniqueidentifier”

    Since geometry and geography data types are not supported with the next SQL Server 2014 in-memory release, spatial data queries will be limited to point (lat,lon) float/real data columns. It has been previously noted that for point data, float/real columns have equivalent or even better search performance than points in a geography or geometry form. In-memory optimizations would then apply primarily to spatial point sets rather than polygon sets.

    Natively Compiled Stored Procedures The best execution performance is obtained when using natively compiled stored procedures with memory-optimized tables. However, there are limitations on the Transact-SQL language constructs that are allowed inside a natively compiled stored procedure, compared to the rich feature set available with interpreted code. In addition, natively compiled stored procedures can only access memory-optimized tables and cannot reference disk-based tables.”

    SQL Server 2014 natively compiled stored procedures will not include any spatial functions. This means optimizations at this level will also be limited to float/real lat,lon column data sets.

    For fully spatialized in-memory capability we’ll probably have to wait for SQL Server 2015 or 2016.


    • Reduce server side latency for spatial queries
    • Enhances performance of image based server side techniques
      • Dynamic Tile pyramids
      • images (similar to MapD)
      • Heatmap grid clustering
      • Thematic aggregation


    • Requires special high memory capacity servers
    • It’s still unclear what performance enhancements can be expected from spatially enabled tables

    D. Hybrids

    The trends point to a hybrid solution in the future which addresses the server side query bottleneck as well as client side navigation rendering bottleneck.

    Server side –
    a. In-Memory spatial DB
    b. Or GPU based parallelized queries

    Client side – GPU enhanced with some version of WebGL type functionality that can makes use of client GPU


    Techniques are available today that can accommodate large date resources in Bing Maps. Trends indicate that near future technology can really increase performance and flexibility. Perhaps the sweet spot for Big Data map visualization over the next few years will look like a MapD or a GPU Hadoop engine on the server communicating to a WebGL UI over 1 gbps fiber internet.

    Orwell feared that we would become a captive audience. Huxley feared the truth would be drowned in a sea of irrelevance.

    Amusing Ourselves to Death, Neil Postman

    Of course, in America, we have to have the best of both worlds. Here’s my small contribution to irrelevance:

    Fig 12 - Heatmap animation of Twitter from MapD over Bing Maps (100fps)

    Neo vs Paleo Geography

    Jurassic Dinosaur skeleton
    Fig 1 – Paleo, Neo – what about Jurassic Geography?

    I gather that there is some twittering about neo versus paleo geography. See Peter Batty’s blog entry or James Fee’s blog. I myself don’t Twitter, but in general I’m happy for Peter’s paleo accomodation of the non twitterers, repeating the conversation in a blog entry. Peter has also updated comments with a new post questioning, “Are we now in a post neogeography era?” The dreaded paradigm shifts are coming fast and furiously.

    I am not really able to comment on neo vs paleo as I myself fall even further back into “Jurassic Geography.” Looking at connotations we have this accounting:

    ····neo - 1990 – present, new, recent, different, Obama, Keynesian, Apple, Google Earth, Cloud, Java C# RubyRails Twitter

    ····paleo - as in paleolithic 2.8m – 10k old, prehistoric, ancient, early, primitive, Nixon, Supply Side, Microsoft, Windows Desktop, ESRI Arc???, C C++ Javascript telephone

    Obviously the “paleo” label is not carried with quite the honor of “neo.” It’s reminiscent of the Galen / Myers-Brigg personality typology characterized as Lion, Otter, Beaver, and Golden Retriever. What do you want to be? Obviously not the beaver, but there has to be a significant part of the world in that category, like it or not. After all what would lions eat for dinner without a few of us beavers? Likewise there is a significant branch of paleo in the GIS kingdom.

    However, in the pre-paleolithic era there are still a few of us left, falling into the “long tail” of the Jurassic. So carrying on down the connotation stream here is the Jurassic geography equivalent:

    ····jurassic – 206m-144m dinosaurs, fossils, pre paleolithic, Hoover, laissez faire, IBM Big Iron, Assembly Cobol, open source

    Wait “Open Source” – Jurassic Geography? How did that get in there? The notoriously frugal days of Hoover never made it into the paleolithic era’s “Supply Side” economy. It’s Keynesian economics all over the neo world, so Jurassic geography is the frugal end of the spectrum and how can you get more frugal than free! Obviously Open Source is as Jurassic as they come in Geography circles.

    As I’ve recently been in a gig hunting mode, I’ve been having quite a few in depth conversations about GIS stacks. As a small businessman outside the corporate halls of paleo geography, I’ve had few occasions to get an in depth education on the corporate pricing world. So I spent the past couple of days looking into it.

    Let’s start at the bottom of the stack. Here is some retail pricing on a few popular GIS databases:

    • Oracle Standard Spatial $17,500 + $3850 annual
    • Oracle Enterprise Locator $47,500 + $10,450 annual
    • SQL Server 2008 Web edition ~ $3500
    • PostgreSQL/PostGIS $0.00

    If you’re a Jurassic geographer which do you choose? Probably not Oracle Enterprise Locator. If your Paleo you look at that and think, “Man, I am the negotiator! We don’t pay any of that retail stuff for the masses.” Neo? – well how would I know how a neo thinks?

    Next take a look at the middle tier:

    • ESRI ArcGIS Server standard workgroup license
      ····Minimum $5000 2cores + $1250 2core annual
      ····Additional cores $2500/core + $625/core annual
    • ESRI ArcGIS hosted application server license
      ····Minimum $40,000 4 cores + $10,000 4 core annual
      ····Additional cores $10,000/core + $2500/core annual
    • OWS GeoServer or MapServer minimum $0 + annual $0
      But, this is GIS isn’t it? We want some real analytic tools not just a few hundred spatial functions in JTS Topology suite. OK, better throw in a few QGIS or GRASS installs and add a few $0s to the desktop production. Oh, and cores, we need some, “make that a 16core 64 bit please” – plus $0.

    I think you catch the Jurassic drift here. How about client side.

    • ESRI Silverlight free, well sort of , if you’re a developer, NGO, educational, or non-profit otherwise take a look at that ArcGIS license back a few lines.
    • Google API it’s Neo isn’t it? $10k per annum for a commercial use, maybe its Paleo after all.
    • Virtual / Bing Maps api $8k per annum transaction based and in typical license obfuscation fashion impossible to predict what the final cost will be. Paleo, “Just send me the invoice.”
    • OpenLayers is a javascript api client layer too, just solidly Jurassic at $0
    • Silverlight well it can be Jurassic, try DeepEarth over at codeplex or MapControl from Microsoft with the Bing imageservice turned off, OSM on.

    It’s been an interesting education. Here is the ideal Jurassic GIS stack:
    Amazon EC2 Windows instance + PostGIS database + GeoServer OWS + IIS Silverlight MapControl client
    The cost: starts at $100/mo(1 processor 1.7Gb 32bit) up to $800/mo(4 processor 15Gb 64bit)

    So what does a Jurassic geographer get in this stack?

    Amazon Cloud based virtual server, S3 Backup, AMI image replication, Elastic IP, AWS console, choice of OS, cores, memory, and drive space. Ability to scale in a big way with Elastic load balancing, auto scaling, and CloudWatch monitoring. Performance options like CloudFront edge service or something experimental like Elastic MapReduce Hadoop clusters.

    PostgreSQL/PostGIS – Standards compliant SQL server with GIST spatial indexing on OGC “Simple Features for SQL” specification compliant geometry with extended support for 3DZ, 3DM and 4D coordinates. A full set of roughly 175 geometry, management, and spatial functions. It supports almost all projections. All this and performance? maybe a little vague but not shabby:

    “PostGIS users have compared performance with proprietary databases on massive spatial data sets and PostGIS comes out on top.”

    Middle Tier:
    Geoserver – standards compliant OWS service for WMS, WFS, WCS.
    Data sources: Shapefile, Shapefile Directory, PostGIS, external WFS, ArcSDE, GML, MySQL, Oracle, Oracle NG, SQL Server, VPF
    Export formats: WFS GML, KML, SVG, PDF, GeoRSS, Png, Jpeg, Geotiff, OGR Output – MapInfo Tab and MID/MIF, Shp, CSV, GeoJSON …
    OGC standard SLD styling, built in gwc tile caching – seeded or as needed, managed connection pools, RESTful configuration api, and ACEGI integrated security.

    WCS adds :

    1. ArcGrid – Arc Grid Coverage Format
    2. ImageMosaic – Image mosaicking plugin
    3. WorldImage – A raster file accompanied by a spatial data file
    4. Gtopo30 – Gtopo30 Coverage Format
    5. GeoTIFF – Tagged Image File Format with Geographic information
    “GeoServer is the reference implementation of the Open Geospatial Consortium (OGC) Web Feature Service (WFS) and Web Coverage Service (WCS) standards, as well as a high performance certified compliant Web Map Service (WMS). “

    Browser client viewer:
    Take your pick here’s a few:

    Well in these economic times Jurassic may in fact meet Neo. The GIS world isn’t flat and Jurassic going right eventually meets Neo going left, sorry Paleos. Will Obama economics turn out to be Hooverian in the end? Who knows, but here’s a proposition for the Paleos:

    Let me do a GIS distribution audit. If I can make a Jurassic GIS Stack do what the Paleo stack is currently providing, you get to keep those annual Paleo fees from here to recovery. How about it?


    Open up that data, Cloud Data

     James Fee looks at AWS data and here is the Tiger .shp snapshot James mentions: Amazon TIGER snapshot
    More details here: Tom MacWright

    Too bad it is only Linux/Unix since I’d prefer to attach to a Windows EC2. TIGER is there as raw data files ready to attach to your choice of Linux EC2. As is Census data galore.

    But why not look further?  It’s interesting to think about other spatial data out in the Cloud.

    Jeffrey Johnson adds a comment to spatially adjusted about OSM with the question – what form a pg_dump or a pg database? This moves a little beyond raw Amazon public data sets.

    Would it be possible to provide an EBS volume with data already preloaded to PostGIS? A user could then attach the EBS ready to use. Adding a middle tier WMS/WFS like GeoServer or MapServer can tie together multiple PG sources, assuming you want to add other pg databases.

    Jeffrey mentions one caveat about the 5GB S3 limit. Does this mark the high end of a snapshot requiring modularized splitting of OSM data? Doesn’t sound like S3 will be much help in the long run if OSM continues expansion.

    What about OpenAerial? Got to have more room for OpenAerial and someday OpenTerrain(LiDAR)!
    EBS – volumes from 1 GB to 1 TB. Do you need the snapshot (only 5GB) to start a new EBS? Can this accommodate OpenAerial tiles, OpenLiDAR X3D GeoElevationGrid LOD. Of course we want mix and match deployment in the Cloud.

    Would it be posible for Amazon to just host the whole shebang? What do you think Werner?

    Put it out there as an example of an Auto Scaling, Elastic Load Balancing OSM, OpenAerial tile pyramids as CloudFront Cache, OpenTerrain X3D GeoElevationGrid LOD stacks. OSM servers are small potatoes in comparison. I don’t think Amazon wants to be the Open Source Google, but with Google and Microsoft pushing into the Cloud game maybe Amazon could push back a little in the map end.

    I can see GeoServer sitting in the middle of all this data delight handing out OSM to a tile client where it is stacked on OpenAerial, and draped onto OpenTerrain. Go Cloud, Go!

    Watching the Cloud

    AWS logo
    Amazon announces some welcome additions to their AWS tools.

    The promise of auto scaling has been a large part of the Cloud since its inception. There have been 3rd party tools in the Amazon Cloud for awhile, but unfortunately involving some cost in complexity. Amazon’s new tools are a welcome addition. At present they are Beta command line APIs but it’s reasonable to assume they will be incorporated into the AWS Console at some point.

    There are three api tool kits in this recent round of announcements:
        Cloud Watch
        Auto Scaling
        Elastic Load Balancing

    Cloud Watch tracks a set of parameters for each running instance monitored:

    Auto Scaling then let’s an administrator set triggers for adjusting the number of EC2 instances in an instance pool to reflect demand. Triggers are based on the monitored parameters.

    Elastic Load Balancing provides round robin web call distribution across a set of identical web instances. In addition to ease of administration it keeps an eye on the health of instances in the pool and auto routes traffic around any problem instances that show up.

    Here is the summary from Amazon’s website:

    • Amazon CloudWatch – Amazon CloudWatch is a web service that provides monitoring for AWS cloud resources, starting with Amazon EC2. It provides you with visibility into resource utilization, operational performance, and overall demand patterns—including metrics such as CPU utilization, disk reads and writes, and network traffic. To use Amazon CloudWatch, simply select the Amazon EC2 instances that you’d like to monitor; within minutes, Amazon CloudWatch will begin aggregating and storing monitoring data that can be accessed using web service APIs or Command Line Tools.
      Fees: $0.015 per hour for each Amazon EC2 instance monitored which amounts to $10.95 per month for a single instance of any type.
    • Auto Scaling – Auto Scaling allows you to automatically scale your Amazon EC2 capacity up or down according to conditions you define. With Auto Scaling, you can ensure that the number of Amazon EC2 instances you’re using scales up seamlessly during demand spikes to maintain performance, and scales down automatically during demand lulls to minimize costs. Auto Scaling is particularly well suited for applications that experience hourly, daily, or weekly variability in usage. Auto Scaling is enabled by Amazon CloudWatch and available at no additional charge beyond Amazon CloudWatch fees.
      Fees: free to Amazon CloudWatch customers. Of course the usual fees for additional EC2 instances apply.
    • Elastic Load Balancing – Elastic Load Balancing automatically distributes incoming application traffic across multiple Amazon EC2 instances. It enables you to achieve even greater fault tolerance in your applications, seamlessly providing the amount of load balancing capacity needed in response to incoming application traffic. Elastic Load Balancing detects unhealthy instances within a pool and automatically reroutes traffic to healthy instances until the unhealthy instances have been restored. You can enable Elastic Load Balancing within a single Availability Zone or across multiple zones for even more consistent application performance. Amazon CloudWatch can be used to capture a specific Elastic Load Balancer’s operational metrics, such as request count and request latency, at no additional cost beyond Elastic Load Balancing fees.
      Fees: $0.025 per hour for each Elastic Load Balancer, plus $0.008 per GB of data transferred through an Elastic Load Balancer

    The api tools can be downloaded here: Developer Tools. Installation involves simply downloading a zip file and unzipping to a convenient subdirectory, where it can be referenced in a setup batch similar to this:
        @echo off
        set EC2_HOME=C:\EC2-Test
        set MONITORING_HOME=C:\EC2-Test\Monitor
        set AUTO_SCALING_HOME=C:\EC2-Test\AutoScaling
        set JAVA_HOME=”C:\Program Files\Java\jdk1.6.0_03″
        set ELB_HOME=C:\EC2-Test\LoadBalance
        set PATH=%PATH%;%EC2_HOME%\bin;%EC2_HOME%\Monitor\bin;%EC2_HOME%\AutoScaling\bin;%EC2_HOME%\LoadBalance\bin
        set EC2_PRIVATE_KEY=C:\EC2-Test\PrivateKey.pem
        set EC2_CERT=C:\EC2-Test\509Certificate.pem

    The only problem I ran into was making sure I properly quoted my JAVA_HOME directory to handle the space in its directory name.

    Here is a quick look at the –help for each of the apis:
    Cloud Watch

    mon –help  
    Command Name Description
    ———— ———–
    mon-get-stats Returns metric data
    mon-list-metrics Returns a list of the metrics
    version Prints the version of the CLI tool and the API.

    Auto Scaling

    as –help  
    Command Name Description
    ———— ———–
    as-create-auto-scaling-group Create a new auto scaling group
    as-create-launch-config Create a new launch config
    as-create-or-update-trigger Creates a new trigger or updates an existing trigger.
    as-delete-auto-scaling-group Delete the specified auto scaling group
    as-delete-launch-config Delete the specified launch configuration
    as-delete-trigger Delete a trigger.
    as-describe-auto-scaling-groups Describes the specified auto scaling group(s)
    as-describe-launch-configs Describe the specified launch configurations
    as-describe-scaling-activities Describe a set of activiti…ties belonging to a group.
    as-describe-triggers Describes a trigger including its internal state.
    as-set-desired-capacity Set the desired capacity of the auto scaling group
    as-terminate-instance-in-auto-scaling-group Terminate a given instance.
    as-update-auto-scaling-group Update specified auto scaling group

    Elastic Load Balancing

    elb –help
    Command Name Description
    ———— ———–
    elb-configure-healthcheck Configure the parameters f…tered with a LoadBalancer.
    elb-create-lb Create a new LoadBalancer
    elb-delete-lb Deletes an existing LoadBalancer
    elb-deregister-instances-from-lb Deregisters Instances from a LoadBalancer
    elb-describe-instance-health Describes the state of Instances
    elb-describe-lbs Describes the state and properties of LoadBalancers
    elb-disable-zones-for-lb Remove Availability Zones from an LoadBalancer
    elb-enable-zones-for-lb Add Availability Zones to existing LoadBalancer
    elb-register-instances-with-lb Registers Instances to a LoadBalancer


    These new tools round out the AWS Cloud offering with some welcome monitor and control capability. Amazon has to keep up in the ‘ease of use’ area to stay ahead of other Cloud vendors. These continue the trend of adding complexity to cost calculations, but Amazon’s fees seem reasonable compared to dedicated and co-located services. Including tools for auto scaling and load balancing helps administrators keep a handle on costs for maintaining instance pools. Once these tools are incorporated into the AWS Console, administration could fade into development / deployment phases of a project with only minimal ongoing maintenance.

    There seems to be some room for additional graphing, charting, and spreadsheet capability for monitor results. These could conceivably be included as part of the AWS Console version of the current cmd tools.

    The lure is increasing availability assurance with decreasing administration costs.

    MapTiles, Pyramids, and DeepEarth

    Fig 1 – DeepEarth Interface to an Amazon S3 tile layer

    Tile Pyramids

    While working with some DeepEarth map interfaces I stumbled across a really useful tool. Gdal2tiles is a cool python script that uses gdal to create tile pyramids from an image. This is the end result of a Google Summer of Code project and we have Klokan Petr Pridal to thank. In addition to creating the tile pyramid it can create a set of kml SuperOverlay files so that the resulting tile set can be served on top of Google. TMS tiles are also useful for other Tile interfaces like OpenLayers Flash and Silverlight DeepEarth.

    As an experiment I downloaded the El Paso County Colorado NAIP from the USDA Geospatial Gateway site. This resulted in a .sid image of about 1.5Gb. Unfortunately the newer NAIP is not complete for El Paso County so I used the 2005 version at 1m resolution.

    Now I downloaded the maptiler python wrapper: It is helpful to have the latest version of gdal2tiles, which for Windows, is easiest to get from the FWTools download page:

    C:\NAIP>gdal2tiles –help
    Usage: [options] input_file(s) [output]
    –version show program’s version number and exit
    -h, –help show this help message and exit
    -p PROFILE, –profile=PROFILE
    Tile cutting profile (mercator,geodetic,raster) -
    default ‘mercator’ (Google Maps compatible)
    -r RESAMPLING, –resampling=RESAMPLING
    Resampling method (average,near,bilinear,cubic,cubicsp
    line,lanczos,antialias) – default ‘average’
    -s SRS, –s_srs=SRS The spatial reference system used for the source input
    -z ZOOM, –zoom=ZOOM Zoom levels to render (format:’2-5′ or ’10′).
    -v, –verbose Print status messages to stdout

    KML (Google Earth) options:
    Options for generated Google Earth SuperOverlay metadata

    -k, –force-kml Generate KML for Google Earth – default for ‘geodetic’
    profile and ‘raster’ in EPSG:4326. For a dataset with
    different projection use with caution!
    -n, –no-kml Avoid automatic generation of KML files for EPSG:4326
    -u URL, –url=URL URL address where the generated tiles are going to be

    Web viewer options:
    Options for generated HTML viewers a la Google Maps

    -w WEBVIEWER, –webviewer=WEBVIEWER
    Web viewer to generate (all,google,openlayers,none) -
    default ‘all’
    -t TITLE, –title=TITLE
    Title of the map
    -c COPYRIGHT, –copyright=COPYRIGHT
    Copyright for the map
    -g GOOGLEKEY, –googlekey=GOOGLEKEY
    Google Maps API key from

    -y YAHOOKEY, –yahookey=YAHOOKEY
    Yahoo Application ID from

    I didn’t explore all of these options, but used maptiler’s nice gui .py interface that queries for parameters to feed into the gdal2tiles pyramiding tool. The NAIP data is in UTM coordinates or EPSG:26913 for El Paso, CO. This means to be useful in Google’s kml or VE the image tiles needed to be warped to EPSG:900913. Lots of computation there.

    . . . . . .

    30 hours later, boy I really need to use that 64bit High cpu EC2 instance …..

    I was troubled at first to see that gdal2tiles was only using one of my system’s cores, until I found Maptiler cluster. Klokan Petr Pridal is already moving on to address the time issue and exploring EC2 cluster possibilities. In theory a cluster of 8core EC2 instances could build tile pyramids in minutes rather than hours.

    After completing, here is what I have ‘../ElPaso2005′ with a completed tile pyramid in the subdirectory tree levels 8 to 16. 435 subdirectories and 46,228 tiles. Each tile is a 256×256 jpg tile that meets OSGeo Tile Map Service TMS specification. The coordinates default to EPSG:900913/EPSG:3785 Mercator which matches the Google projection. The gdal2tile evidently calculates the optimal depth for my image resolution in a World Wide image. It then appears to do the complete tile division at the lowest level, in this particular case 16. Subsequent tile levels are simply mosaics of four tiles from the next level down so it is a relatively fast process to produce all the tiles above the bottom level. Total disk size for the jpeg files is only 1.7Gb which is very close to the original MrSId size.

    Interfaces – DeepEarth, OpenLayers, Google Earth

    OK nice, I have a local static tile set and I have some auto generated Google SuperOverlay and OpenLayer Flash interfaces, but since I’ve been playing with DeepEarth recently I next went to the DeepEarth codeplex project to grab the latest 1.0 release. This project has come a long way since I last played with it. I liked the interface from the beginning, but was stymied by the VirtualEarth TileProvider that was pulling hundreds of tiles from VE. A cost analysis showed this was going to be too expensive for common use. Now, however, the project has a number of other TileProviders such as OSM, Yahoo, and WMS, not yet Google. (Google restricts direct use of their tile servers) Google would share the VE disadvantage of cost but way less since the cost is per page view not tile downloaded. Of course really cool is being able to develop any xaml + TileProvider you want.

    The project also includes an example showing how to set up a local tile set. The example uses 256×256 tiles but not in the OSGeo TMS directory structure. Here is an example using this DeepEarth local example TileProvider DeepEarth BlueMarble. You can see the tiles spin in from a local directory store on the server. The resolution is not all that great, but a full resolution BlueMarble isn’t that hard to get from BitTorrent. The alternative selection “Blue Marble Web” tiles are full 500m resolution hosted on Amazon S3 courtesy of The Amazon S3 bucket is a flat structure, in other words the buckets don’t have an internal directory tree, which is why the tiles are not stored in a TMS directory tree.

    The DeepEarth local TileProvider was easily adapted to suit the TMS directory so I could then directly pull in my El Paso tiles and show them with a DeepEarth interface. However, if I wished to take advantage of the high availability low latency S3 storage, I would need to flatten the tile tree. In S3 subdirectory trees are hiddeen inside buckets as file names. In the S3 case the tile names include the zoom level as a prefix: The 5 on the 5-r8-c24 nomenclature is the zoom level while row is 8 and column 24 at that level. TMS would encode this in a tree like this: ../bluemarble/5/24/8.jpg. The zoom level= subdirectory, row = subdirectory name, and column = name of tile. The beauty of TileProvider class in DeepEarth is that minor modifications can adapt a new TileProvider class to either of these encoding approaches.

    Performance and reliability is a lot nicer on an Amazon S3 delivery, especially with heavy use. Once in S3 a map could also be promoted to a CloudFront edge cache without much difficulty. I imagine that would only make sense for the upper heavily used zoom levels, say 1-5 for BlueMarble. Once down in the level 6 part of the pyramid the number of tiles start escalating dramatically and repetitive hit rates are less likely.

    Zoom Level Tile Count

    1. 1
    2. 4
    3. 16
    4. 64
    5. 256
    6. 1024
    7. 4096
    8. 16384
    9. 65536
    10. 262144

    I can see a common scenario moving toward this approach:

    1. Base layer tile sets from OSM, OpenAerial, VE, (not GE) etc
    2. Client specific base tile pyramids as an opacity overlay
    3. Dynamic layers as WMS from a middle tier WMS/WCS server like Geoserver
    4. Dynamic vectors coming from a spatial RDBMS like PostGIS behind Geoserver

    ···In this architecture map data is reused from open sources wherever possible and dynamic custom layers are coming out of OGC spec servers. Adding static client specific overlays to Tile storage provides the faster performance helpful for DeepZoom and Flash enabled interfaces.


    Geospatial data is growing up and specialized data sources are multiplying. Now there are several background basemaps to choose from Virtual Earth, Yahoo, OpenStreetMap, Open Aerial. Google Earth is also available if you stick with the Google API in a Google container. The more sophisticated backgrounds require a commercial license, but there are also open sources that provide a great deal of capability. Projects like Flash OpenLayers and SIlverlight DeepEarth are improving the user experience substantially while giving developers a great deal of freedom.

    A new Silverlight VE Map control is evidently on the way from Microsoft in the next couple of weeks. It is Silverlight so has some of the coolness factor of DeepEarth. I should be able to write some more on it next week.

    Some Samples:

    Fig 2 – DeepEarth TileProvider interface to an OpenStreetMap

    Fig3 – OpenLayer interface example created by MapTiles

    Fig 4 – Google Earth interface example created by MapTiles

    I still like the approach of using a kml servlet to pull quads on demand from the seamless DRG service on Terraserver. Terraserver keeps chugging along.

    Fig – Google Earth interface with DRG quads from Terraserver

    Fig 5 – OpenLayer with Virtual Earth background example created by MapTiles

    Rumblings in the DataBase world – Cloud Data

    Fig 1 – GridSQL EnterpriseDB

    This quote from Larry Ellison triggered some thoughts about DataBases in the Cloud.

    “The interesting thing about cloud computing is that we’ve redefined cloud computing to include everything that we already do. I can’t think of anything that isn’t cloud computing with all of these announcements. The computer industry is the only industry that is more fashion-driven than women’s fashion. Maybe I’m an idiot, but I have no idea what anyone is talking about. What is it? It’s complete gibberish. It’s insane. When is this idiocy going to stop?”
    Larry Ellison Oracle CEO

    Here Larry Ellison is making a point that Cloud Computing is confusing, in his mind, because it doesn’t change anything Oracle is already doing. Of course he misses the whole point of Cloud Computing, which is all about Where not What. The question is not ‘What Cloud Computing does’, but ‘Where it is done’, and “where it is done” is in global distributed data centers. Of course Oracle has RDBMS server instances available in the Cloud using Amazon AMI templates. There are also PostgreSQL, MySQL, and MS SQL server instances. So anything we do now, as far as databases, can also be done in the Cloud, but that is only scratching the surface.

    This blog caught my attention:
    Is Global Database Warming happening?

    It points out that for the first time in a couple of decades we may be facing a sea change in the venerable RDBMS world.

    If you look at Amazon’s AWS constellation one recent service sticks out SimpleDB This is a database and it isn’t a database. It certainly is not anything like the RDBMS we have come to know. It was introduced in the AWS sphere to address some problems. One, traditional RDBMS are cumbersome to setup and use, but second, RDBMS doesn’t fit all that well in distributed partitioned systems.

    Werner Vogel of Amazon posted a very interesting article – Eventually Consistent , that discusses the CAP theorem in relation to very large distributed data systems:
    ” ….CAP theorem, which states that of three properties of shared-data systems data consistency, system availability, and tolerance to network partition only two can be achieved at any given time.”

    Werner is pointing out that large partitioned data systems ipso facto include partition tolerance, leaving only ‘consistency’ and ‘availability’ as options in the CAP Heisenberg Uncertainty principle. The RDBMS we currently live with are optimized for ‘consistency.’ Amazon’s SimpleDB chooses the ‘availability’ property over ‘consistency,’ and eases our pain with “eventual consistency.”

    Back to RDBMS in the Cloud. The normal way of doing DB business is one server, or virtual server, and one RDBMS. More instances and cluster configurations end up using various fancy replication schemes. Optimizing for consistency incorporate blocking schemes at some level, DB, Table, or Row. A write is guaranteed to register in any subsequent read. SimpleDB isn’t like that. It is more like DNS where a write eventually comes up but not necessarily immediately.

    The reason for tapping “availability” over “consistency” is that consistent schemes are brittle and break across partitions. Set a row block, or block of any kind, that is connected across a decoupled SOA, and you are going to have a break eventually. So in Werner’s eyes let’s aim for “Eventually Consistent” to avoid “Eventually Broken.”

    Now if you read about Microsoft’s Azure eventually you run across SDS.
    “Microsoft SQL Data Services (SDS) offers highly scalable and Internet-facing distributed database services in the cloud for storing and processing relational queries.” Sounds familiar.

    I don’t know what all this means to Microsoft, but it certainly appears that SDS is addressing a different kind of database. Is Microsoft already seeing the handwriting on the wall for Cloud Database configurations? Nice to know SDS is somewhere out there in the misty future.

    Boston GIS’s Obe made a point that, “All the cloud has to do is keep the relational model but make the storage less localized and relevant.”, and points to GridSQL and or Oracle RAC

    Perhaps we are not seeing a sea change in RDBMS after all, just a repostioning of the traditional database to address the needs of globally partitioned systems. Goodby replication and hello Grid DB, but I still wonder if Werner isn’t on to something with his recognition that you can’t have both “Availibilty” and “Consistency” in the new world order. Will GridSQL, RAC, and SDS be “Eventually Consistent” or “Eventually Broken?”

    Perhaps the experts can comment.

    Amazon's Cloud Console released as Beta

    Fig 1 – AWS Console in Chrome

    New announcement on Amazon’s AWS Management console now in beta

    I’ve really liked using Elasticfox and S3fox

    But these tools require using Firefox as your browser. Now that Chrome has been released I divide my time between IE, FireFox, and Chrome so it is nice to manage AWS from any browser.

    We are still waiting for some other additions listed in the “Coming Soon” column:

    • Tagging - Label and group Amazon EC2 resources with your own custom metadata to make it easier to identify and manage your instances, volumes, and other EC2 resources.
    • Monitoring, Load Balancing and Auto-scaling – View real-time monitoring of operational metrics within Amazon EC2, configure load balancing and auto-scaling rules through a web-based UI.
    • Amazon S3 Support – Create and delete Amazon S3 buckets, upload and download objects through your browser, edit permissions, set log data, and manage URLs.
    • Amazon SimpleDB Support – Construct SimpleDB queries through a point-and-click query expression builder and explore your data through a graphical dataset viewer.
    • Amazon SQS Support – Manage your SQS queues, add and retrieve messages from you queues, test and build your applications with help from the AWS Management Console.
    • CloudFront Support – Setup and administer content delivery distributions on Amazon CloudFront using a simple web-based tool on the AWS Management Console.

    The real deal will be Monitoring, Load Balancing and Auto-scaling

    Some of these tools have been available in GoGrid for awhile, notably Load Balancing.

    Microsoft Azure describes similar capabilities as part of their Azure Cloud but this is still future too. The Azure universe is in CTP but it looks like Microsoft is very interested in coopting the Cloud, which I guess means that Amazon AWS and Google Apps are successfully changing things.

    EC2 Peace of Mind

    Elastic Block Store, EBS, is a useful extension of the AWS services offered by Amazon’s cloud platforms. EBS provides a way to add storage in an EC2 instance independent mode. In other words, storage doesn’t have to be tied to the instance storage, but can exist independently as an external volume. The big deal here is the persistence of your data, even if an instance happens to get killed for some reason. An additional backup security is the ability to make snapshots of an EBS volume and store on the S3 service in S3 buckets.


    The cost is not a great burden:

    EBS Volumes

    · $0.15 per GB-month of provisioned storage

    · $0.10 per 1 million I/O requests



    Amazon EBS Snapshots to Amazon S3

    · $0.15 per GB-month of data stored

    · $0.01 per 1,000 PUT requests (when saving a snapshot)

    · $0.01 per 10,000 GET requests (when loading a snapshot)


    The basic approach for creating an EBS volume:

    1. start an instance

    2. create an EBS volume

    3. attach the volume to the instance

    4. partition and format the volume

    5. add data and services to the instance and its attached volume

    6. bundle and register the instance as an AMI stored in S3

    7. create a snapshot of the EBS volume


    After this is complete there is a peace of mind knowing that the instance can be reconstructed from backup services.


    Restore follows this path if the EBS volume is intact:

    1. start a new instance from the AMI bundled previously

    2. just attach the volume to the new instance

    3. repoint the DNS to this new instance server


    Restore follows this path if the EBS volume is also trashed:

    1. start a new instance from the AMI bundled previously

    2. create a new volume from the S3 snapshot

    3. attach this new volume to the new instance

    4. repoint the DNS to this new instance server


    I have an Open Source GIS stack loaded on a windows ec2 instance and decided it was time to make a conversion to the security of an EBS volume.

    The AWS details are here:


    First make sure the latest api_tools are installed – EC2 API version 2008-08-08:


    Choose the availability zone that matches the zone of the instance you wish to use.

    C:\EC2>ec2-create-volume –size 50 –availability-zone us-east-1b

    VOLUME vol-******** 50 us-east-1b creating 2008-11-04T15:38:20+0000


    Once the volume is created it will be noted as “available.”

    C:\EC2>ec2-describe-volumes vol-********

    VOLUME vol-******** 50 us-east-1b available 2008-11-04T15:38:20+0000


    Now the volume can be attached to the instance you had in mind.

    C:\EC2>ec2-attach-volume vol-******** -i i-******** -d xvdf

    ATTACHMENT vol-******** i-******** xvdf attaching 2008-11-04T15:41:48+0000


    Once the volume is attached, it’s time to ‘remote desktop’ to the windows instance.

    Open the Disk Management tool:

    Start/administrative tools/computer management /storage Disk Management


    You should then see the attached EBS volume and be able to add it to the instance with appropriate partition and format.

    Partition info:

    Partition walk thru:


    Once this is done you have an additional drive available referenced to the external EBS volume. In my example the E: drive.


    Fig 1 – Example of Disk Manager on an EC2 windows instance showing an EBS volume


    Once you have a useable EBS, how would you go about making it useful to the GIS stack?

    In my stack I am using:





    This means I would like to move all of the PostgreSQL data, tomcat webapps, and the geoserver data to the new EBS volume. Then it will be available for snapshot backup.


    Postgresql data:

    Changing Postgresql data to a new location involves a change to the registry. Stop postgresql service, then change registry ImagePath, move the C:\Program Files\PostgreSQL\8.3\data subdirectory to its new EBS location, E:\postgresql_data, and finally restart the service.

    Run regedit:

     “HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\pgsql-some version” and on the ImagePath change the –D option to point to a location on your new EBS volume. Here is the wicki entry with details:  Postgresql Wicki for changing PGDATA.


    Tomcat webapps:

    Open %TOMCAT_HOME%/config/server.xml with a text editor. There should be an entry similar to this:

    <Host name=”localhost” appBase=”E://tomcat_webapp”

    unpackWARs=”true” autoDeploy=”true”

    xmlValidation=”false” xmlNamespaceAware=”false”>

    Here you can see that I’ve changed the appBase to point to a subdirectory on my E: drive, the EBS volume. Copy the existing webapp subdirectory to this EBS subdirectory.


    Geoserver data:

    Go to the geoserver webapp’s WEB-INF/web.xml and make sure that the GEOSERVER_DATA_DIR points at a location on the EBS volume. Remember to make the change to the web.xml found in the tomcat webapp directory on the new EBS volume. Copy the geoserver data to its new EBS subdirectory.






    Now the data for PostgreSQL/PostGIS, Apache Tomcat, and Geoserver will be accruing on an external volume, safe from sudden EC2 instance death. Of course now that EC2 is no longer beta and the SLA agreement is available this should be a rare occurrence.


    Now to make things even safer lets run a snapshot:

    C:\EC2>ec2-create-snapshot vol-********

    SNAPSHOT snap-******** vol-******** pending 2008-11-04T22:14:30+0000


    C:\EC2>ec2-describe-snapshots snap-********

    SNAPSHOT snap-******** vol-******** completed 2008-11-04T22:14:30+0000


    At this point a snapshot of my volume is stored to S3 where I can use it to create a new volume for use in another instance. I can use the snapshot if I’m creating multi instance clusters or if I need to restore my instance.


    Of course it would also be wise to make an AMI bundle to reflect the changes made to the basic instance, directory pointers regedits etc. Here is the ami bundle guide for windows instances:  AMI Bundle for windows info

    You will first need to prepare a bucket on S3 to receive the AMI bundle.  S3 Info



    C:\EC2 >ec2-bundle-instance i-******** -b ec2-windows-bucket -p ec2-windows_image -o <Amazon EC2 Key ID> -w <private access Key>

    BUNDLE bun-******** i-******** norm-ec2-windows ec2-windows_image 2008-11-05T15:28:15+0000 2008-11-05T15:28:15+0000 pending

    C:\EC2 >ec2-describe-bundle-tasks

    BUNDLE bun-******** i-******** ec2-windows ec2-windows_image 2008-11-05T15:28:15+0000 2008-11-05T16:07:08+0000 complete

    C:\EC2 >ec2-register ec2-windows/ec2-windows_image.manifest.xml

    IMAGE ami-********



    Amazon cloud is now out of beta and comes with independent storage volumes and snapshot capability useful for backup and scaling functions. GIS open source stacks can make use of these options without a huge effort.