Last month, I had several one-on-one conversations with CIOs, software executives, and cloud experts about database-as-a-service (DBaaS). My goal was to understand the challenges and benefits of DBaaS from a customer perspective and to identify future trends in this hot emerging space.
Performance and Latency Issues with Stand-alone DBaaS
As I synthesized the various perspectives in these conversations, one thing became pretty clear: many of these users see no real value in stand-alone cloud databases. One CIO of a leading SaaS com¬pany summarized this sentiment:
“Most of us are focused on whole solutions, not pieces and tools. When you develop an app in the cloud, there is already a DB in there. When you go to deploy it, you’ve already made your choice.”
Stand-alone DBaaSs also face a significant technical hurdle: latency between the database tier and the application tier.
If an application is running on one cloud, say Amazon, and the database is running on, say, Database.com, performance issues will inevitably surface. Clearly, it makes sense to stick to one vendor that provides both the application and database platforms for data-intensive mission-critical applications.
Another key issue that emerged was the difficulty of migrating large data, complex schemas, and interconnected datasets to the external cloud.
“Is there a need for a hosted database service? It makes a lot of sense to me for modern applications where the applications are written from the ground up with that kind of architecture. It does not make sense for classic enterprise applications with a lot of data behind the firewall.” – VP Products, cloud startup
Because applications and data needs grow exponentially, bandwidth issues have remained with us for the past 20 years. Let’s do the math for transfer times today: according to the AWS import/export site, which processes storage media to move data into the cloud, transferring one terabyte of data over a T1 line takes 82 days and, over a T3 line, takes a full three days. You are better off FedEx’ing your data on storage devices overnight!
I believe DBaaS is the next frontier in the evolution of the cloud; but there is a big glaring hole with the cloud in data handling. There is still a lot of work on the vendor side around researching which data technologies make the most sense in the cloud. A lot of investment and engineering work is required to support elastic transactions, automatic read/write scale-out, and data sharding/partitioning.
Historical Context and Future DBaaS Trends
The 1990s was the decade of the database for the enterprises: system of record, highly reliable, mission-critical, transactional, etc. The goal was high availability and reliability, which was achieved through clustering, centralized data management, and high-powered servers and compute services connected to that data cluster with high-speed SANs.
Oracle is the king of this world, and they have done a phenomenal job of it.
For a long time, this was the right architecture for data-intensive applications, with the high-powered CPUs (compute) staying close to the data to achieve performance and latency goals. The location of the data pretty much dictated their physical location.
“You can move virtual machines around pretty easily. You can’t move around 27 terabytes of data very easily. So I think what is going to happen is big data will become the new center of gravity that the virtual machines will move to.” – CMO, cloud storage startup
This transactional, data-centric, system-of-record enterprise model has been a well-established and proven model for 20 years. However, this is no longer adequate in the new world of business and social networks driven by today’s economics of global commerce.
Modern Applications Require Different Data Handling Approaches
In response to the changing economics of global commerce and the rise of social networks, modern applications are designed to be inherently collaborative, Web oriented, and data rich. The old database model cannot support these applications that are geographically distributed, highly scalable, and elastic.
Consider this: one billion tweets posted on Twitter every week are not stored in a traditional database.
The scale of Facebook is no less staggering: more than 250 million users login to Facebook everyday. These users interact with more than 900 million objects (pages, groups, events, etc). The Facebook population is highly distributed around the world (70% reside outside the United States). The traditional centralized database model is woefully inadequate to handle this scale and geographical distribution. Therefore, Facebook uses a unique global data-caching technology to achieve this scale.
It’s a valid argument that enterprise apps don’t need to deal with this kind of scale. But the power of this new application paradigm is not lost on the enterprise.
Enterprise applications focus on driving the business forward and delivering value back to the business, but traditional applications are not keeping pace with the growing need for business agility in a highly competitive global marketplace. End users increasingly demand easy-to-use, anywhere, anytime access to business applications through diverse mobile devices.
Enterprise applications, therefore, will begin to look more like Facebook – user-friendly, Web-oriented, and collaborative.
And data management must evolve to service this need.
The Promise of Database as a Service
Managing traditional databases is a pain. It involves a number of complex, time-consuming, expensive tasks including installation, configuration and tuning, upgrades, replication, clustering, load balancing, backups, and so on.
The promise of the cloud, of course, is that you don’t worry about many of these database management activities– the cloud vendor handles many of these for you. Customers like the idea because it’s easy and it does the job and, more importantly, they can focus their resources and time on their business.
Customers have the option of hosting their SQL server or Oracle server on Amazon. If they have an existing SQL DB and want to move it over to Amazon, they just need to import the database, put their application on it, change the connection string, and they are good to go. A lot of the replication, backup, and other management is handled by Amazon. But this is just hosting databases in the cloud, not true cloud services. Customers still have access the physical DB server in the cloud, still need a DBA, and do all the traditional configuration, tuning, upgrades, etc.
If we think about DBaaS from a cloud-computing characteristics perspective, we would come up with the following wish list of what an ideal DBaaS should support:
1. Elasticity (scale-out and scale-in)
- The database can grow and shrink automatically based on both read and write loads, with no downtime and without any changes required in the application.
2. Self-healing and Failover
- The database can automatically identify and isolate failures and heal itself from a server failure without the application needing to be aware of the failure.
- The database can handle multiple replicas of the data and can tolerate failures in networks, hardware, and software.
3. Pay Per Actual Use
- Pay per actual usage (e.g., based on size on DB) vs. pay per server instance size.
- Any number of databases can run on any set of instances. Any given instance may be dedicated to a single database or shared between databases in any arrangement.
Several such database services have emerged with some or many of the above characteristics in the past few years. Amazon has the Relational Database Services (RDS) based on mySQL (and Oracle is coming soon). Microsoft has its SQL Azure service, and Salesforce introduced its database.com service recently.
Exciting new startups such as xeround, NimbusDB, FathomDB, ScaleBase, and Akiban are building cloud “native” databases with different capabilities and differentiation from the ground up.
VMware has its gemfire data-management framework, which is a core part of its vFabric Cloud platform. Gemfire has many of the essential characteristics outlined earlier in the wish list including: dynamic scalability, seamless failure handling and recovery, distributed data management, replication, partitioning, and data-aware routing.
Data Size and Data Type Challenge
International Data Corporation (IDC) estimates that unstructured data (video, audio, pictures, e-mails, log files, Web pages, IM conversations, etc) will grow more than 60 percent per year compared to traditional data, which is growing 20 percent annually. This data explosion is not limited to consumer Web applications; enterprises are facing similar unprecedented explosion of data volumes.
These unprecedented data flows created an entirely new category: Big Data. Traditional databases or even data warehouses are ill-equipped to store and retrieve this data in a timely manner. MapReduce, Hadoop and NoSQL emerged to capture, store, and retrieve Big Data much more efficiently than traditional transactional databases.
Traditional transactional databases are also not very good with analyzing and mining intelligence from oceans of data. This spawned an entire industry around business intelligence and analytics.
All of this just made the life of a CIO more complex. More tools. More technologies. More decisions. Wouldn’t it be nice if there were a unified solution that seamlessly handled structured, unstructured, and real-time data with transactional and analytics capabilities built into it? What do you think?
I welcome your comments, opinions, and questions. Drop me a line at firstname.lastname@example.org or post your comment here on the blog.