Editor’s note: Companies of all sizes are at risk for extravagant costs in producing documents in the e-discovery process of lawsuits as well as risks in non-compliance with thousands of legal requirements and considerations governing hundreds of different kinds of data sets. I talked about these issues with John Montaña, an attorney and a principal with Montaña & Associates, a full service consulting firm and one-stop-shop for information management and governance needs. The firm has consulted with Fortune 500 companies and other clients for three years, helping them in the extremely complex process of mapping laws and requirements to their data sets and also providing risk models. Khoste, just released as a SaaS product, now simplifies that process. John shares insights about managing risks in data retention and retrieval and explains how they used the Quantrix Modeler to build Khoste.
Please describe the risks companies face with their data sets.
John Montaña: There are literally many thousands of different legal requirements that apply to the kinds of data created or captured by the average organization. Companies have records in potentially hundreds of different data types in some sort of classification scheme in dozens or hundreds of different systems or repositories. There are also legal considerations such as the statutes of limitations, mandatory inspection and audit cycles, privacy rules and restrictions about where data can be stored (and for how long) and where it’s allowed to go.
Mapping which laws and requirements apply to which sets of data and coming up with an aggregated rule is a very challenging task. But it’s very important because companies are often involved in regulatory action and in lawsuits. So they need to make defensible decisions within this very complex regulatory environment.
For example, one of our clients had 60 terabytes of unstructured data (SharePoint documents, spreadsheets, word processing documents, etc.), which translated into 19 billion documents. That’s a lot of individual documents. No matter what kind of sophisticated software you use, if you have to go through 19 billion documents in a lawsuit e-discovery process and then have legal review of a significant chunk of those documents, you’re talking very serious money. And 60 terabytes is not a particularly large volume these days.
So from an e-discovery standpoint, just cutting down on the sheer volume by imposing some rules on these gigantic masses of data is very important because, everything being equal, if the pile is half that size, e-discovery will be half the cost.
It seems to me that this issue of managing risks around data retention and retrieval will become as high on the C-suite agenda as security issues.
John Montaña: It is increasingly becoming prominent, and the costs have gotten to the point where it attracts C-level attention. Not only are companies facing extravagant costs in producing documents in e-discovery, but they also risk adverse outcomes. Companies get penalized pretty regularly if they cannot conduct discovery effectively. They then run into all sorts of problems that may well ultimately end up as claims by the other side for spoliation of evidence. When that sort of thing happens and hits the front page, there is a lot of C-level incentive to get their arms around information management.
So companies are being more proactive now in seeking solutions for this problem?
John Montaña: Yes. It used to be that no one approached consultants like us until there had been a big train wreck in the form of a lawsuit with a bad outcome and they had a lot of problems as a result. But these days people tend to be much more proactive because standards are emerging and the courts have articulated a lot of duties with respect to information management, and it’s possible to get ahead of the curve and avoid the train wreck.
And, train wreck aside, organizations have become much more conscious of the costs of Big Data that is not well organized or maintained and in trying to control those costs.
Do you find that most companies really have a problem and don’t have effective information-management processes for organizing their data?
John Montaña: Often, data silos are not well organized at all. Many organizations have “dark data,” meaning they know they have a big data silo somewhere in a server farm, but they have a very poor, or no, understanding of what’s in that silo. Of course that invites all sorts of problems when it comes to e-discovery because if you tell your opponent in a lawsuit that you don’t know what’s in your records, of course they’ll want to find out what’s in there — and you’ll have to pay for that. And of course, if it just sits there uselessly, it costs money.
Our tool is not the complete solution for that, but it’s an important part of the solution. There are other tools that can look in those dark-data sets and identify document types through various techniques such as keyword searches or predictive coding. Once you have some understanding of what’s actually in a dark-data set, then you can put rules around it for retention, disposition and other management purposes that permit you to get at least a high-level handle on e-discovery and regulatory compliance.
If you have an understanding of what the data set is and thereby have an understanding of what kinds of rules govern it, you can start to prune off the excess from that data set and employ other management techniques so that you can reduce the volume of material you look through for e-discovery and thereby reduce the cost of discovery.
Of course, Big Data and analytics are just adding to this problem, right?
John Montaña: Absolutely. The Big Data problem has caused massive problems for companies because the people who designed a lot of Big Data systems never really thought through some of the consequences of having infinitely growing data sets.
So when your terabytes turn into petabytes and you’re sitting on multiple petabytes of information, from a sheer cost perspective that’s not cheap to manage. Everybody says data storage is cheap, but it is not cheap on those scales. And now we have cloud computing — another bucket for data sets, allowing data to keep ballooning in size.
Tell me about Khoste and how you used the Quantrix Modeler product to help develop your tool’s risk models.
John Montaña: We were using Quantrix for three or four years at a prior organization, and we use it for a lot of aspects of our consulting business. It’s a tool for developing business models for finance and other things as well as risk models around unstructured data.
We originally developed our software solution as an in-house tool for our use. As consultants reaching advisory conclusions for clients, we face the same complex many-to-many mapping problem that companies face with their data management. We started looking years ago for a flexible and powerful tool that we could leverage but initially decided to invest a very significant amount of money building a custom tool.
Then we found Quantrix. Its Modeler product works better for risk models and is more flexible than a purpose-built tool would have been. And it cut our development time by a couple of years.
So you use it to develop risk models. Is that basically what your new software product delivers?
John Montaña: The commercial solution we’re offering now is a legal compliance tool. It tells people what they need to do and what they need to know with respect to legal compliance for their data sets. It also has a risk-management component. A lot of times there isn’t a hard-and-fast answer about what companies should do with a given set of data. In those instances, our clients can look at an aggregated view of factors based on risk parameters to make a reasoned decision.
We also provide services that involve risk quantification, which are not part of this software tool.
How does your product differ from others in the market?
John Montaña: There are a handful of products that do something along the same lines, but most of them are using outdated methods and dated technology. We also track many more parameters. For example, some products deliver a retention number for how long to keep a particular kind of record. That’s fine, but that’s not the only thing people care about. We track additional parameters such as privacy issues and audit considerations.
In addition, our underlying database is the best in the industry, so the strength of our conclusions is stronger because we have a stronger underlying data set from which to develop them.
Is your product a SaaS solution?
John Montaña: Yes. A significant issue with some competing products is that they must be installed on the user’s system, which creates all sorts of issues. Because ours is a SaaS solution, our clients won’t have issues with updating the software or database in an on-premises solution, and IT departments won’t have to deal with compatibility and security issues. It also gives them much more immediate access to the underlying legal database because it lives in the cloud and we update it on a continuous basis. So our clients will have a much more current view of the legal situation than they can get with an on-site installed solution.
John Montaña is a principal with Montaña & Associates. He advises a variety of organizations on records and information management. His work has included analysis and advice on a wide variety of information governance issues. He is widely recognized as one of the foremost records management experts in the United States. He has published four books as well as dozens of articles and is an active seminar speaker. John can be reached at email@example.com.
Kathleen Goolsby is managing editor of SandHill.com.