Skip to main content

Why Big Data Actually is the New Cloud — and for Good Reason

By October 19, 2014Article

I know everyone hears a lot about Big Data and cloud, but what if I could convince you that the substance behind both movements of technological innovation is nearly identical? Perhaps it might help customers understand the evolutions of these platforms and how they can adopt them. 

The phrase “cloud computing” became most popular with the increased use of server virtualization. Enterprises had a very structured and progressive journey for adopting virtualization and cloud computing. The first phase was all about cost savings from server consolidation. Being a relatively unproven technology, the only way that pilot projects could get off the ground was if the project essentially funded itself through cost savings. 

Once C-level execs understood the cost savings of virtualization, adoption continued to grow but forked into two parallel phases. In one fork, companies used cloud technologies to eliminate operational expense and lead time needed to deliver infrastructure services faster to end users. Meanwhile, in the other fork, companies sought to fortify cloud technologies with security, compliance, availability, data protection and performance solutions so that they could operate mission-critical applications on virtualized cloud environments. 

This path from cost savings to the dual fork of more agile on-demand access and more fortified mission-critical environments is the undeniable journey of every significant enterprise technology platform. And now we are seeing it happen all over again with Big Data technologies. 

Let’s be real: one of the biggest drivers of the popularity of NoSQL and Hadoop platforms has been their low cost of operations. Companies, particularly those in a consumer Internet business that deal with very large quantities of simple data, could not use traditional RDBMS platforms to cost-effectively solve their data needs. So these new data management platforms emerged to provide more efficient mechanisms not only to solve new classes of problems but also to optimize classic problems like ETL processing. 

But while companies continue to optimize and right-size their data management portfolios with these new data platform technologies, we are also starting to see the progression beyond cost optimization happen in enterprises.  

In the same way that virtualization technologies set application developers free from the boundaries of physical infrastructure provisioning, Big Data technologies are setting data developers free from the boundaries of physical data modeling. Rather than spend weeks designing fixed schemas for datasets, the new trend is to leverage SQL-on-Hadoop, SQL-on-JSON, data discovery and data wrangling technologies to infer the structure of schemaless data on the fly. 

These technologies are perfect for data developers and data scientists who are just experimenting with analytical models in sandbox environments. The value of these exploration projects would go completely unrealized without the use of these emerging agile technologies. The rise of the data scientist role as a courageous explorer in new frontiers of informational opportunity has entirely been driven by the advance of these simple, flexible tools for exploring raw and often barely structured data. 

Enterprise IT teams should realize there is no going back — the same way that cloud computing permanently empowered the application developer to shorten time to market, Big Data will empower the data developer to shorten time to insight. 

In a parallel track, just as with virtualization, we are also seeing the maturing of technologies that help NoSQL and Hadoop technologies address more mission-critical data workloads. These technologies include capabilities like data masking, data lineage, record linkage, authorization, in-memory processing, etc. 

It is simply not tenable that a chief security officer or chief compliance officer will permanently block the adoption of technologies that are irrefutably more efficient for the company. Data platforms will evolve to address enterprise needs, and leading customers in regulated industries like financial services and healthcare will multiply the size of markets for these emerging technologies by orders of magnitude. 

The key for any enterprise looking at these emerging technologies is to understand the journey of success. Enterprises start using Big Data technologies to deliver cost savings both for developing new data pipelines and from optimizing traditional ones. But once technological viability is proven, enterprises shouldn’t be caught by a false tradeoff between agility and service level assurance. 

The preproduction data scientists and data developers will go down the path of Big Data technologies that enable agility. The production data service providers will go down the path of Big Data technologies that assure service levels. Whatever phase your organization is in, Big Data technologies offer you a whole new level of internal benefit as well as the opportunity to become more competitive, responsive and personalized than ever before. 

Murthy Mathiprakasam is a principal product marketing manager for Informatica’s Big Data products. Murthy has a decade and a half of experience with high-growth software technologies including roles at Mercury Interactive, Google, eBay, VMware and Oracle. Follow him on Twitter or LinkedIn.