Editor’s note: Paxata was founded in 2012 to be the first self-service data preparation solution designed for everyone who deals with data (chief data officer, business analysts, data scientists and technical/IT teams). With the business intelligence trend of increasing self-service functionality and user data preparation, Paxata’s ability to quickly combine, compare, filter and clean large volumes of data from multiple sources is a critical success factor. I talked with CEO Prakash Nanduri about data governance issues and trusting data quality while integrating data from multiple sources for decision making.
What drove you to launch Paxata?
Prakash Nanduri: In typical Silicon Valley style, in 2011, Dave Brewster and I met at his daughter’s birthday party and started talking. The outcome of that meeting ended up being a fantastic partnership, which led to our pioneering a whole new category in enterprise software in 2012: self-service data preparation.
We determined that 80 percent of the time spent in data exercises was absorbed in manual data preparation efforts, leaving analysts only 20 percent of their time to apply to high-value analysis and decision making. With the objective of flipping that ratio, we pulled in Nenshad Bardoliwalla and Chris Maddox and started the journey.
Our mission, then and now, is empowering everyone within information-driven organizations to extract the most value from their data as they make decisions, regardless of the analytic tools they choose, their data sources or use cases.
So you built the Paxata platform. How does it uniquely address the complex issues of data governance and result in more time for decision making?
Prakash Nanduri: We defined a comprehensive platform that brings together rich data integration and data quality capabilities, along with enrichment, collaboration and transparent governance. It’s delivered in a self-service workspace that every business analyst can use without needing to code, script or predefine data models or schemas.
From a governance perspective, the Paxata system uniquely handles the need for freedom and flexibility while ensuring that data, users and actions performed can be tracked, replayed, reordered and reused. While analysts work, Paxata transparently records every step taken; this supports IT’s requirements for data lineage and auditability. Another unique capability we offer is giving IT control over the types of data and functions an analyst can access within the system.
I don’t suppose you’d share Paxata’s “secret sauce” for achieving these outcomes. Most organizations still struggle with data governance, especially with the increasing use of self-service functionality in business intelligence tools and the associated activity of end-user data preparation and integration of data from multiple internal and external sources.
Prakash Nanduri: Sure. Paxata’s secret sauce is that it is “adaptive.” With Paxata, business analysts are able to adjust – on the fly – to the iterative business requests that come in on a daily basis.
It is very typical that, as data is being understood and analyzed, heads of business need more data to complete their questioning. So they go back to their analysts to repeat the data prep cycle again. At that point, analysts typically spend weeks and months in spreadsheets and homegrown data marts trying to combine their clean data with additional data from raw or outside sources, hoping that their next answer set will be what the business needs. Until now, that back-and-forth was the most painful part of every analytic exercise.
So how did you manage to eliminate that pain?
Prakash Nanduri: Hands down, the most powerful aspect of the Paxata solution is how we implemented machine learning that uses proven technologies from consumer search and social media, namely intelligent indexing, textual pattern recognition and statistical graph analysis.
We apply proprietary, patent-pending algorithms to the linguistic content of both structured and unstructured data, which enables Paxata to automatically build a comprehensive and flexible data model in the form of a graph, reflecting similarities and associations among data items. The system uses associations between the data to detect and resolve both syntactic and semantic data-quality issues, rapidly improving the quality of large data sets. As analysts add more data sources, the Paxata system leverages the expanded associations among the data to further improve the quality of the data.
[Editor’s note: For more information on end-user data prep, read “Data Preparation Pitfalls and How to Extract the Most Value from Data for Decision Making.”]
Beyond the users’ self-service demands, how does Paxata enable data governance from an IT perspective?
Prakash Nanduri: IT organizations, data developers and architects must constantly innovate in order to keep up with the demands of the organization while balancing the need to protect business assets. By delivering Paxata’s platform as a shared environment, IT can secure and monitor how data is prepared and used, and build transparent, emergent governance that is aligned with the semantics of the business, not data systems.
You mentioned sharing and reusing data projects across teams, and collaboration is becoming an important aspect of business intelligence. Please share a real-world example of how the Paxata solution enabled a company to be more collaborative in its data prep and subsequent decision making.
When multinational food company Del Monte split into two separate companies, a task force of IT professionals and business analysts faced the daunting task of migrating years of critical data from the original company’s data warehouse into the IT systems of the newly organized company. Making the task even more difficult, legal requirements related to the spinoff necessitated a pressing deadline of just a few months.
The complexity of this data migration was compounded by the unanticipated spin-off of numerous other lines of business that created significant resource allocation and time-sensitivity challenges.
There were considerable business continuity needs. For instance, the new company needed to migrate over 300 million records contained in current and historic data files containing key operational information such as production, pricing, invoices, suppliers, customers and promotions. Ideally, they wanted an archive of five years of trailing data; but time was short due to the deadline for separating the companies, and they weren’t sure they could do the work in time.
Did they also have to cleanse the data during that short time frame?
Yes, and the data also had to be properly formatted and purged of information that would not be needed at the newly spun-off company. That often meant combining and comparing databases for different areas of the business, such as sales, materials, suppliers and distributors.
The volume of data was too great to be transferred using the tools normally available to the business analysts. So they called us.
Using Paxata, the team carried out over 100 data migration projects to extract and transform data for use in the new company’s IT systems. Paxata’s ability to quickly combine, compare, filter and clean large volumes of data from multiple sources proved to be a critical success factor. The team was able to get their work done without requiring technical support to extract the data or generate code and scripts to explore, clean or shape the data.
Paxata’s elastic cloud architecture allowed the team to start new projects, add more data, increase and decrease concurrent sessions, without having to think about capacity. Behind the scenes, Paxata spun up new instances of the pipeline server to support massive workloads or spikes in demand, as well as route to a specific pipeline server to support a shared services model with variable workloads. This gave the team nearly unlimited capacity to support any kind of workload needed to get the data migration project completed.
Wow. That’s a stunning success story. With the growing trend of self-service functionalities in business intelligence technologies, is it easy for end users who lack analyst skills to learn how to use Paxata?
Prakash Nanduri: We designed the Paxata workspace from the ground up to eliminate the need for custom scripts, coding expertise, etc. With an Excel-like interface and built-in guidance aided by machine learning and sophisticated algorithms, anyone can perform advanced functions like data clustering, de-pivoting or anomaly detection in one click.
The application provides visual guidance that makes it easy to bring together data, find and fix dirty or missing data, and share and reuse data projects across teams – regardless of users’ technical skills or the data volumes.
For more information on end-user data prep, read “Data Preparation Pitfalls and How to Extract the Most Value from Data for Decision Making.”
Prakash Nanduri, co-founder and CEO of Paxata, has 20+ years’ experience in startups and large companies. He was co-founder/VP of Velosel Corporation (acquired by TIBCO). He led the post-merger integration effort at TIBCO, then spent three years at SAP as head of product and technology strategy within the office of the CEO and was responsible for strategic initiatives including the SAP Big Data (Hana) business strategy. Connect with Prakash or follow on Twitter.
Kathleen Goolsby is managing editor of SandHill.com.