Rumblings in the DataBase world – Cloud Data

Fig 1 – GridSQL EnterpriseDB

This quote from Larry Ellison triggered some thoughts about DataBases in the Cloud.

“The interesting thing about cloud computing is that we’ve redefined cloud computing to include everything that we already do. I can’t think of anything that isn’t cloud computing with all of these announcements. The computer industry is the only industry that is more fashion-driven than women’s fashion. Maybe I’m an idiot, but I have no idea what anyone is talking about. What is it? It’s complete gibberish. It’s insane. When is this idiocy going to stop?”
Larry Ellison Oracle CEO

Here Larry Ellison is making a point that Cloud Computing is confusing, in his mind, because it doesn’t change anything Oracle is already doing. Of course he misses the whole point of Cloud Computing, which is all about Where not What. The question is not ‘What Cloud Computing does’, but ‘Where it is done’, and “where it is done” is in global distributed data centers. Of course Oracle has RDBMS server instances available in the Cloud using Amazon AMI templates. There are also PostgreSQL, MySQL, and MS SQL server instances. So anything we do now, as far as databases, can also be done in the Cloud, but that is only scratching the surface.

This blog caught my attention:
Is Global Database Warming happening?

It points out that for the first time in a couple of decades we may be facing a sea change in the venerable RDBMS world.

If you look at Amazon’s AWS constellation one recent service sticks out SimpleDB This is a database and it isn’t a database. It certainly is not anything like the RDBMS we have come to know. It was introduced in the AWS sphere to address some problems. One, traditional RDBMS are cumbersome to setup and use, but second, RDBMS doesn’t fit all that well in distributed partitioned systems.

Werner Vogel of Amazon posted a very interesting article – Eventually Consistent , that discusses the CAP theorem in relation to very large distributed data systems:
” ….CAP theorem, which states that of three properties of shared-data systems data consistency, system availability, and tolerance to network partition only two can be achieved at any given time.”

Werner is pointing out that large partitioned data systems ipso facto include partition tolerance, leaving only ‘consistency’ and ‘availability’ as options in the CAP Heisenberg Uncertainty principle. The RDBMS we currently live with are optimized for ‘consistency.’ Amazon’s SimpleDB chooses the ‘availability’ property over ‘consistency,’ and eases our pain with “eventual consistency.”

Back to RDBMS in the Cloud. The normal way of doing DB business is one server, or virtual server, and one RDBMS. More instances and cluster configurations end up using various fancy replication schemes. Optimizing for consistency incorporate blocking schemes at some level, DB, Table, or Row. A write is guaranteed to register in any subsequent read. SimpleDB isn’t like that. It is more like DNS where a write eventually comes up but not necessarily immediately.

The reason for tapping “availability” over “consistency” is that consistent schemes are brittle and break across partitions. Set a row block, or block of any kind, that is connected across a decoupled SOA, and you are going to have a break eventually. So in Werner’s eyes let’s aim for “Eventually Consistent” to avoid “Eventually Broken.”

Now if you read about Microsoft’s Azure eventually you run across SDS.
“Microsoft SQL Data Services (SDS) offers highly scalable and Internet-facing distributed database services in the cloud for storing and processing relational queries.” Sounds familiar.

I don’t know what all this means to Microsoft, but it certainly appears that SDS is addressing a different kind of database. Is Microsoft already seeing the handwriting on the wall for Cloud Database configurations? Nice to know SDS is somewhere out there in the misty future.

Boston GIS’s Obe made a point that, “All the cloud has to do is keep the relational model but make the storage less localized and relevant.”, and points to GridSQL and or Oracle RAC

Perhaps we are not seeing a sea change in RDBMS after all, just a repostioning of the traditional database to address the needs of globally partitioned systems. Goodby replication and hello Grid DB, but I still wonder if Werner isn’t on to something with his recognition that you can’t have both “Availibilty” and “Consistency” in the new world order. Will GridSQL, RAC, and SDS be “Eventually Consistent” or “Eventually Broken?”

Perhaps the experts can comment.

Comments are closed.