You’ve heard the term a thousand times by now … Big Data. It’s common knowledge that many companies are drowning in their own pool of data. Access to key business information is too often limited due to constraints imposed by outdated ways of storing, managing and analyzing data.
Traditionally, data systems have been built for the convenience of the data system itself, using primitives such as tables, columns, strings of text and rules. This is not the way anyone thinks about data in the real world, and this way of operating also constrains an IT-level manager to build and support data silos that are hard to access. Yet another barrier for the business user of information is that the concepts needed to solve real-world problems are often housed in multiple sources of data.
As the number and volume of sources continue to grow, it becomes more difficult for people to know what information may be available and to figure out what questions they need to ask.
Traditional approaches that require physical consolidation of data in a warehouse face multiple problems. They are not conducive to delivering a broader view of data or addressing the increasingly dynamic nature of data and the ways that users want to explore or query data. They also only enable end users to answer specific, predefined questions known in advance when a warehouse is being designed.
Enter the data scientist (with life preserver in tow). These folks are tasked with resolving the many data-provisioning issues faced by organizations such as:
- Collecting and manually integrating fragmented information
- Aggregating related and relevant information
- Screening outdated, irrelevant or conflicting information
- Providing access to meaningful information to business users
These issues are just the tip of the iceberg, and they can take hundreds of man-hours (or data scientist hours) to complete. An alternative solution is to leverage a semantic virtual data warehouse (that supports the Logical Data Warehouse approach) with true federated search capabilities. This type of solution allows business users to address situations where:
- The data needed to solve problems comes from many disparate sources
- The data is incomplete and may even be inconsistent
- You need to rapidly iterate through different questions and explore information to develop and test out hypotheses about problems you are solving
A semantic data virtualization solution leverages the power of RDF and SPARQL to unify data for querying and aggregation of results, allowing it to present data from multiple datasets as if it were within a single database.
Other benefits of a semantic data virtualization approach include:
- Avoiding the overhead of replicating data because the data is federated
- A simple way to incrementally add new datasets
- Ease of changing or updating consolidated schema to reflect new concepts and types of questions without the need to change the underlying data set—unlike what is required with a traditional data mart
- The ability to tailor different virtual data warehouse models (consolidated schemas) to serve the needs of different users without duplication of data or new ETL scripts
Big Data is here to stay, but it doesn’t have to be as scary as its name suggests. Organizations need to simply and readily bring data together to view, explore and analyze it. With the use of a semantic data virtualization solution, accessing and utilizing data from multiple sources and through multiple means has never been easier.
Don’t continue to let your data scientist (or business user) drown in your business data. Enable them do their job more efficiently than ever by using semantic Web technology to give them the life preserver they really need.
Robert Coyne is co-founder/CMO and VP of professional services at TopQuadrant. Dr Coyne has worked in system design for 25+ years with experience in business, consulting, academia and research. He leads TopQuadrant’s marketing and PR initiatives centered on its TopBraid business solutions and semantic solutions platform. Earlier Robert was CTO of Solution Technology International and a senior consultant at the Object Technology Practice in IBM Global Services. He is co-author of “Capability Cases: A Solution Envisioning Approach.”