Data virtualization has long been synonymous with information agility as it delivers a simplified, unified and integrated view of trusted business data in real time or near real time as needed by the consuming applications, processes, analytics or business users. The technology integrates data from disparate sources, locations and formats, without replicating the data, to create a single “virtual” data layer that delivers unified data services to support multiple applications and users. Data virtualization capabilities are expanding to provide governed self-service data discovery and integration oriented toward business users.
Data virtualization will be more tightly integrated with streaming technologies in 2017
Today’s data landscape is becoming dominated by the Internet of Things (IoT). Devices equipped with sensors along with the applications that monitor them are becoming widespread, and the event data they provide is readily accessible. Enterprises have gradually come to realize the potential value of this data in reducing costs, increasing revenue, making timely decisions and increasing their competitiveness in the digital marketplace.
In order to leverage the growing volume of IoT data, integration strategies must evolve to incorporate collection, ingestion, processing and integration of event data. Enterprises first need to identify potential IoT use cases and then determine how the corresponding event data aligns with other available internal and external data to potentially deliver additional insight and value. For example, sensor data might be combined with social data to better model consumer perceptions, or historian data might be combined with costing data to analyze asset performance.
IoT use cases should be effectively incorporated in an overall information integration strategy. Event data use cases might include:
- Real-time analysis (anomaly and fraud detection, behavioral analysis)
- Real-time monitoring and response (safety, vehicle telemetry, fraud)
- Real-time integration of data to support downstream consumption
- Improved operational efficiency and reduced cost through predictive and proactive maintenance
- Direct monetization
The process of integrating large volumes of IoT data with other supplemental internal and external data sources to enable useful analyses can be challenging. In 2017, data virtualization will become the ideal choice of technologies for accomplishing this integration for a variety of reasons:
- Most notably, data virtualization works with data in place, thus eliminating unnecessary data movement.
- It applies sophisticated optimization techniques to efficiently retrieve the data, eliminates the cost of storing information multiple times and eliminates the need to update information in multiple places.
- Leveraging data virtualization, additional sources of information can be added quickly without the need to spend time installing and configuring new databases or clusters for storing the consolidated information.
While data virtualization already supports interoperability with streaming technologies, this support will be enhanced in 2017 to enable tighter integration with streaming technologies such Apache Kafka, Apache Storm and Spark Streams.
Anticipated developments include combining real-time IoT data for a specific time window with supplemental data such as cloud, enterprise and external data. This will make it possible to output enriched IoT data to stream processing pipelines so that any output data stream or derived data stream can concurrently be passed to multiple/downstream consumers. In addition, it will be possible to aggregate data into more meaningful forms and formats.
In seeking to better control and utilize the masses of data they encounter, organizations should explore how data virtualization can enhance IoT integration and analytic processes.
The use of self-service data integration will not grow significantly through 2017
Self-service data integration is practiced by citizen integrators that, along with citizen developers and citizen data scientists, is a critical citizen user role. Citizen users are not IT department personnel but, rather, line-of-business (LOB) users who are driven by a keen sense of curiosity and exploration regarding data. They seek novel approaches to combining and analyzing data in order to surface unique perspectives and innovative solutions to business problems.
Citizen integrators work largely outside of IT, employing self-service integration tools to perform lightweight data integration primarily for exploration and virtual sandboxing while IT continues to perform the bulk of heavyweight data integration. Citizen integrators benefit their organizations by taking up the slack from understaffed, underbudgeted IT departments struggling to accommodate increasing demands from business units for readily available data to enable timely analysis and decision making.
Despite the contributions of Citizen integrators in alleviating the workload of IT, the increasing complexity of data integration coupled with potential limitations in the capabilities of self-service integration tools will impact the expansion of self-service data integration, confining Citizen integrators to their current levels of support through 2017.
A combination of factors is adding to the complexity of data integration including a growing volume of data and variety of data formats, the distributed nature of the data landscape, a variety of options for database management (such as NoSQL, Hadoop and Relational) and the need to Integrate legacy with emerging data sources such as IoT and social media. Because IT personnel are equipped with the skill set needed to perform complex data orchestration, data integration will continue to be IT driven in the foreseeable future.