Rumblings in the DataBase world – Cloud Data

Fig 1 – GridSQL EnterpriseDB

This quote from Larry Ellison triggered some thoughts about DataBases in the Cloud.

“The interesting thing about cloud computing is that we’ve redefined cloud computing to include everything that we already do. I can’t think of anything that isn’t cloud computing with all of these announcements. The computer industry is the only industry that is more fashion-driven than women’s fashion. Maybe I’m an idiot, but I have no idea what anyone is talking about. What is it? It’s complete gibberish. It’s insane. When is this idiocy going to stop?”
Larry Ellison Oracle CEO

Here Larry Ellison is making a point that Cloud Computing is confusing, in his mind, because it doesn’t change anything Oracle is already doing. Of course he misses the whole point of Cloud Computing, which is all about Where not What. The question is not ‘What Cloud Computing does’, but ‘Where it is done’, and “where it is done” is in global distributed data centers. Of course Oracle has RDBMS server instances available in the Cloud using Amazon AMI templates. There are also PostgreSQL, MySQL, and MS SQL server instances. So anything we do now, as far as databases, can also be done in the Cloud, but that is only scratching the surface.

This blog caught my attention:
Is Global Database Warming happening?

It points out that for the first time in a couple of decades we may be facing a sea change in the venerable RDBMS world.

If you look at Amazon’s AWS constellation one recent service sticks out SimpleDB This is a database and it isn’t a database. It certainly is not anything like the RDBMS we have come to know. It was introduced in the AWS sphere to address some problems. One, traditional RDBMS are cumbersome to setup and use, but second, RDBMS doesn’t fit all that well in distributed partitioned systems.

Werner Vogel of Amazon posted a very interesting article – Eventually Consistent , that discusses the CAP theorem in relation to very large distributed data systems:
” ….CAP theorem, which states that of three properties of shared-data systems data consistency, system availability, and tolerance to network partition only two can be achieved at any given time.”

Werner is pointing out that large partitioned data systems ipso facto include partition tolerance, leaving only ‘consistency’ and ‘availability’ as options in the CAP Heisenberg Uncertainty principle. The RDBMS we currently live with are optimized for ‘consistency.’ Amazon’s SimpleDB chooses the ‘availability’ property over ‘consistency,’ and eases our pain with “eventual consistency.”

Back to RDBMS in the Cloud. The normal way of doing DB business is one server, or virtual server, and one RDBMS. More instances and cluster configurations end up using various fancy replication schemes. Optimizing for consistency incorporate blocking schemes at some level, DB, Table, or Row. A write is guaranteed to register in any subsequent read. SimpleDB isn’t like that. It is more like DNS where a write eventually comes up but not necessarily immediately.

The reason for tapping “availability” over “consistency” is that consistent schemes are brittle and break across partitions. Set a row block, or block of any kind, that is connected across a decoupled SOA, and you are going to have a break eventually. So in Werner’s eyes let’s aim for “Eventually Consistent” to avoid “Eventually Broken.”

Now if you read about Microsoft’s Azure eventually you run across SDS.
“Microsoft SQL Data Services (SDS) offers highly scalable and Internet-facing distributed database services in the cloud for storing and processing relational queries.” Sounds familiar.

I don’t know what all this means to Microsoft, but it certainly appears that SDS is addressing a different kind of database. Is Microsoft already seeing the handwriting on the wall for Cloud Database configurations? Nice to know SDS is somewhere out there in the misty future.

Boston GIS’s Obe made a point that, “All the cloud has to do is keep the relational model but make the storage less localized and relevant.”, and points to GridSQL and or Oracle RAC

Perhaps we are not seeing a sea change in RDBMS after all, just a repostioning of the traditional database to address the needs of globally partitioned systems. Goodby replication and hello Grid DB, but I still wonder if Werner isn’t on to something with his recognition that you can’t have both “Availibilty” and “Consistency” in the new world order. Will GridSQL, RAC, and SDS be “Eventually Consistent” or “Eventually Broken?”

Perhaps the experts can comment.

Amazon's Cloud Console released as Beta

Fig 1 – AWS Console in Chrome

New announcement on Amazon’s AWS Management console now in beta

I’ve really liked using Elasticfox and S3fox

But these tools require using Firefox as your browser. Now that Chrome has been released I divide my time between IE, FireFox, and Chrome so it is nice to manage AWS from any browser.

We are still waiting for some other additions listed in the “Coming Soon” column:

  • Tagging - Label and group Amazon EC2 resources with your own custom metadata to make it easier to identify and manage your instances, volumes, and other EC2 resources.
  • Monitoring, Load Balancing and Auto-scaling – View real-time monitoring of operational metrics within Amazon EC2, configure load balancing and auto-scaling rules through a web-based UI.
  • Amazon S3 Support – Create and delete Amazon S3 buckets, upload and download objects through your browser, edit permissions, set log data, and manage URLs.
  • Amazon SimpleDB Support – Construct SimpleDB queries through a point-and-click query expression builder and explore your data through a graphical dataset viewer.
  • Amazon SQS Support – Manage your SQS queues, add and retrieve messages from you queues, test and build your applications with help from the AWS Management Console.
  • CloudFront Support – Setup and administer content delivery distributions on Amazon CloudFront using a simple web-based tool on the AWS Management Console.

The real deal will be Monitoring, Load Balancing and Auto-scaling

Some of these tools have been available in GoGrid for awhile, notably Load Balancing.

Microsoft Azure describes similar capabilities as part of their Azure Cloud but this is still future too. The Azure universe is in CTP but it looks like Microsoft is very interested in coopting the Cloud, which I guess means that Amazon AWS and Google Apps are successfully changing things.