International Data Corporation (IDC) estimates that unstructured data (video, audio, pictures, e-mails, log files, Web pages, IM conversations, etc.) will grow more than 60 percent per year compared to traditional data, which is growing 20 percent annually. This data explosion is not limited to consumer Web applications; enterprises are facing similar unprecedented explosion of data volumes. In the enterprise context, storage is generally more unpredictable than compute (e.g., you don’t double your employees in a year).
Enterprise storage needs are growing at 50-60 percent a year. A recent IBM study reported that the demand for storage has gone up from 150 exabytes (million Petabytes) in 2005 to 1200 exabytes this year. Customers need a better and more cost-effective way to manage that explosion in data.
These unprecedented data flows created an entirely new category: Big Data. Traditional databases or even data warehouses are ill-equipped to store and retrieve this data in a timely manner. MapReduce, Hadoop, and NoSQL emerged to capture, store, and retrieve Big Data much more efficiently than traditional transactional databases.
Traditional transactional databases also are not very good at analyzing and mining intelligence from oceans of data. This spawned an entire industry around business intelligence and analytics.
All of this just made the life of a CIO more complex. More tools. More technologies. More decisions. Is cloud-based analytics and databases an answer to this issue?
In my previous article on Database as a Service (DBaaS), I examined the challenges and benefits of DBaaS from a customer perspective and identified future trends in this hot emerging space.
Because applications and data needs have grown exponentially, bandwidth issues have remained with us for the past 20 years. Let’s do the math for transfer times today: according to the AWS import/export site, which processes storage media to move data into the cloud, transferring one terabyte of data over a T1 line takes 82 days and, over a T3 line, takes three days. You are better off FedEx’ing your data on storage devices overnight!
But the way data is transferred over a WAN has rapidly advanced with techniques like de-duplication, compressions, edge caching, traffic shaping, and sending only incremental changes, etc. If you send a terabyte of data over a WAN, it takes some time the first time. But then if you only send changes, it’s pretty quick.
Security is improving rapidly as well. With homomorphic encryption, a company could, for example: “encrypt its entire database of e-mails and upload it to a cloud. Then it could use the cloud-stored data as desired to search the database to understand how its workers collaborate. The results would be downloaded and decrypted without ever exposing the details of a single e-mail.”
Because the cloud can store vast amounts of data, the content in the cloud becomes the center of gravity because you go through the pain of sending it out there once. Then it’s pretty painless to send the changes. It depends on the size of what you are moving. Also, certain geographies have much higher bandwidth over the WAN than other countries, so it depends on that as well. We’ve seen customers in Asia and Europe with 225 megabit connections.
One you have the data in the cloud, you could run your analytics there because you have plenty of elastic compute you can throw at the number crunching.
“If you look at business intelligence or analytic data, what you want to have is movement – you have a large amount of data and then you do a phenomenal amount of compute needed for a short amount of time. That is much more suited to an elastic compute cloud.” – CMO Cloud Startup
All this sounds great; but in practice, companies are not exactly jumping to the cloud for the aforementioned security and performance reasons. Typically the data on which companies perform analytics tends to be strategic and competitive. In many industries, the biggest issue will continue to be regulatory compliance. Even if the data in the cloud is protected, many end users are increasingly accessing these applications and data on mobile devices, which are basically insecure.
Cloud, or no cloud, most of today’s analytics solutions are, unfortunately, mostly bolt-on solutions to existing ERP systems. Backward-looking reporting is useful to some degree, but busy managers and end users still lack real-time actionable information. Thus many of these solutions have questionable value and are a harder sell to users who are very cautious about IT spending in tough economic times.
In the meantime, the rapid emergence of social media has driven the development of social business analytics tools. However, these tools have the same issue in that they do not provide any actionable intelligence or true analytical capabilities such as correlation analysis or mining of significance of Twitter or Facebook messages. For the most part, companies are using these tools for company branding and listening to customer sentiment, etc. The real value is when these tools can provide early warning signals of emerging issues or opportunities.
In the end, analytics solutions in the enterprise and the social media areas have to prove their business value and ROI. The cloud delivery option will be attractive but the solutions must enable intelligent and actionable decision making.
Kamesh Pemmaraju heads cloud computing research for Sand Hill Group. Follow him on Twitter @kpemmaraju.