Data Virtualization Trends Laying New Foundation for Information Infrastructure

Editor’s note: Suresh Chandrasekaran, senior vice president North America at Denodo, a data virtualization firm, says that opportunities to leverage big data and BI coexist with “belt-tightening” trends that cause a slow-down in expensive data warehouse projects in favor of laying a foundation for an organic, flexible and virtualized information infrastructure in the future. In this interview, he discusses that trend as well as a few others such as what is “cloud critical” in 2013.
SandHill.com: You mentioned that a lot of companies are tightening their belts on data warehouse and MDM projects, but my observation from talking with people in several industries is that many companies are just now undertaking those types of projects because of the need for business intelligence insights.
Suresh Chandrasekaran: What we’re seeing is that the data warehouse footprint is sinking, whereas the overall business intelligence footprint or need is growing larger. It’s not that we won’t see data warehousing projects anymore. Think of belt-tightening in these projects in the same way as the idea of re-use and recycle. It’s not that you’re going to eliminate one way of doing things and completely shift over to the other. Another analogy could be cars and commuting when gas prices go up: no one is going to stop going to work, but pragmatism comes in with the idea of what is the more efficient way to get to work.
With data warehousing projects in the past, there was a lot of definition phase and a lot of hiring of big consulting companies to go around and understand all the different reporting needs and different analytical needs. It is a very top-down, waterfall sort of approach that takes time to solidify data requirements, develop large transformation and data-integration projects and feed the data warehouse. Just the business consulting project is very expensive.
SandHill.com: How does new virtualized data integration differ from current trends?
Suresh Chandrasekaran: Today in the definition phase, companies are opting for more rapid prototyping where the actual users start to get to use data services and reporting before building the warehouse. People are now starting to use data virtualization — or some people call it self-service, on-demand data services a lot more in the front end of the process to help define requirements. It’s very interactive, not a top-down flow, a “define as we go along” approach. Later on, they may find that on-demand, virtualized access to data (also called logical data warehouse) is meeting their needs including performing better and with more flexibility, so they reduce the scope of expensive, traditional replication-based integration and data warehouses substantially. The same pattern also happens for registry-style Virtual MDM replacing traditional MDM.
SandHill.com: When should companies store information in a data warehouse?
Suresh Chandrasekaran: In the past they needed to store it there, and there are still some reasons to store it for historical analysis if a company wants to do time series or regression analysis gaps. But for other kinds of ad hoc reporting, real-time or even predictive analytics involving many disparate sources, data virtualization is the best approach.
So they’re starting to design decisions in the context of need. When should I use real-time integration via virtualization technologies or memory-based technologies, versus when should I store and manage information in a data warehouse? The latter is expensive but also less flexible. So the “belt-tightening” trend is one side of the coin; the need for agility is the other.
SandHill.com: How does the concept of re-use come into play?
Suresh Chandrasekaran: In the past data infrastructures and application infrastructures and even the integration middleware that goes with them tended to be much more special purpose. For example, the integration, the tool sets, etc. that were used for more analytics applications (business intelligence) were somewhat separate from those that are used more for operational (SOA, BPM, application integration suites). So two different sets of applications eventually accessed the same information. And often a third stack was used for unstructured information, knowledge management and collaboration, and a fourth for B2B integration. Today the layer of unified data services enabled by data virtualization can be built in a flexible enough manner to enable re-usability across these stacks.
SandHill.com: You mentioned earlier that the approach today is not to build the infrastructure before the requirements definition phase.
Suresh Chandrasekaran: You build it for one project, but while building it, you keep in mind that the same data services could be re-used for something else. For example, at a health insurance company, data services built for analytics and a logical data warehouse could be re-used for a customer portal, a claims processing application, HIPAA compliance and improving internal operational processes.
SandHill.com: So you’re really saying that data virtualization is data services.
Suresh Chandrasekaran: I think the understanding of virtualized data services still has a way to go because a lot of the focus has been on business intelligence and logical data warehousing. But once they get it, customers realize that the killer app is data services for the entire enterprise.
If I break down virtualization in non-vendor terms, there are two key concepts. One is layers of abstraction. This means data services that make information friendlier to business users and applications regardless of where the data comes from, what format it starts out in, whether it’s internal or external, whether it’s in the cloud or in a data center and whether it’s structured or unstructured. It’s a virtualized representation of data and relationships between them.
The second key concept is that the way a company pushes or delivers the data is “as a service.” The service can take different forms: accessed in real time or in batch mode, and push vs. pull, in many formats like virtual database, Web services, data feed, etc. Data virtualization is really a more flexible way of delivering information to applications and users.
From a business perspective, it’s about broader access to data in business terms (unlocking data, the return on data assets), and more flexibility. The time to market on a new solution with different requirements cannot follow the old Waterfall approach of six months of definition; companies need very rapid ways of provisioning the information to applications and users. And they need to start merging what they previously dealt with separately.
SandHill.com: Do you think that most enterprises and SMBs are aware that they need to take this approach?
Suresh Chandrasekaran: It’s not broadly happening yet at all levels, but business and IT as well as top system integrators and analysts are beginning to agree on the benefits of data virtualization and accelerate its adoption. It’s beyond the early adopter phase of 20 percent range and moving to 50 or 60 percent in the next couple of years, which would be considered mainstream.
One of the indications that this is beginning to gain momentum is that even the Data Warehousing Institute (TDWI) now has a course on data virtualization. Forrester, Gartner and other analysts are recommending DV more often. Leading SI firms such as Capgemini have data virtualization practices.
SandHill.com: What is driving the change?
Suresh Chandrasekaran: A typical large enterprise has four different data-integration stacks: 1) a business intelligence stack, 2) an application integration or process application stack, 3) an unstructured stack (knowledge management, search, document, SharePoint portals, etc. for collaboration), and 4) the information that exists outside the enterprise (e.g., social media, government data, hospital regulations, etc.).
The business side needs the data in all those stacks merged. The current technical architecture can’t service the business needs on a request basis. Secondly, business people are tired of waiting for their project to get IT attention. They want to be enabled and empowered with data access when and how they need to use it. And IT wants to be able to provide this in an efficient, controlled manner.
With a platform like Denodo’s for integrating disparate data, a company can allow its business users and applications to feed off of it and decide at the point of consumption whether the user or application needs the information in a real-time, cache or batch mode, and in what format.
SandHill.com: What do you believe is “cloud critical” in 2013?
Suresh Chandrasekaran: The aftermath of cloud growth has actually been increased silos. So it will be critical to mature cloud interfaces and architectures, making them more modular as well as improving APIs and service levels.
Some companies are still moving stuff in and out from in house to cloud, but they now recognize that there are some applications and data that will remain in their data centers or private cloud because of sensitivity or criticality. So the focus is shifting much more to the interfaces and, regardless of where these applications and data are, making them talk to each other and provide an abstraction layer on top against which they can build new applications, develop user reports, or provide user access. The silos need interfaces so they can be treated as though they are unified.
SandHill.com: At the enterprise level, can companies handle this unification of silos on their own?
Suresh Chandrasekaran: They can, and they cannot. For example, a lot of people rushed to the cloud. After making the move quickly, then they realized they had to deal with how to get the data that they moved out to the cloud (such as Salesforce.com) back to integrate with a business process. They can have the tools to copy the data back in house, but it makes no sense to do so because it defeats the purpose of why they moved to the cloud in the first place. If they need to do so virtually and with flexibility, they will need some help both from their cloud/SaaS vendors providing better APIs and from data virtualization technologies.
SandHill.com: So what can they do to remedy this situation?
Suresh Chandrasekaran: Changes are happening in two ways. Customers are forcing their cloud vendors to provide better application interfaces. Most SaaS applications provide lot more functionality through the Web browser, but only expose about 50 percent or less through APIs. If you wanted to have an application interface to the same functionality, you have to write custom code.
So SaaS vendors need to provide better APIs, but some are not responsive or are unable to do that. So middleware technologies like ours help bridge the gap. We provide the virtualization layer by pre-building connectivity to cloud and SaaS applications and allowing our clients to build on top.
Whereas in the last several years there was a rush to adopt cloud computing, now it’s more of a push to make the cloud interface interactions more robust along the lines of how application interfaces used to be in house. The solution is coming from multiple areas: from middleware vendors like us and from the cloud vendors adopting more interoperability standards, better APIs, etc.
SandHill.com: What is your advice to readers of this article who are on the business side and facing these problems? How should they approach talking to the CIO or IT group about it?
Suresh Chandrasekaran: As an example, they can say very specifically, “I’m trying to extend our customer service platform with a new customer self-service portal and a new enterprise application that brings together actionable data so our service reps can improve customer experience in real time. I need this information here and that information there, and share the data across applications. But we can’t wait six months.” The basics of what they need to convey are: “Let me discover the information that is available to me, let me access it in a very short time frame and with a lot of flexibility as I keep refining my requirements, and then I need to be able to build my applications or usage on top of that in an agile fashion.”
Then they need to ask the CIO: “How can you do that for me? I understand that this can be done through data virtualization.” The business person has to approach it with their needs but also prompt the CIO, data architect, or IT people with the knowledge that such technology exists and this is what it’s capable of doing. That might help both sides come together and define exactly how they are going to proceed around that.
Suresh Chandrasekaran is senior vice president at Denodo. Throughout his career in product management and marketing roles at Vitria, Alta Vista, Compaq and as a management consultant at Booz Allen, Suresh has helped businesses leverage information through the medium of technology to deliver a competitive advantage. He speaks frequently at conferences on technology and its business impact, drawing from 15+ years of experience in leading integration middleware and Web companies. Contact him at suresh@denodo.com.
Kathleen Goolsby is managing editor at SandHill.com.

Tags:

agility analytics APIs data as a service data center data warehouse flexibility infrastructure MDM predictive analytics SaaS service level social media

Data Virtualization Trends Laying New Foundation for Information Infrastructure

Fresh From The Blog

Learn More