Editor’s note: Why are Big Data solutions so costly? With the quickly growing number of products for managing and analyzing Big Data, how can companies make wise and cost-effective decisions when choosing a solution? I talked with Actian’s Fred Gallagher, general manager of Vectorwise, the world’s fastest solution for data analysis and business intelligence, about the cost factors in fast data analysis.
SandHill.com: What are the cost components involved in Big Data solutions?
Fred Gallagher: Some of the main cost associations of a comprehensive Big Data solution include storage infrastructure, servers for data processing, and the software associated with those two — plus the cost of people to deploy and operate the solution. But a key cost factor is scalability — the ability to start at an affordable entry point and grow from there. To some degree, scalability leads to flexibility to spend budgetary resources on projects over time, as opposed to large amounts up front.
SandHill.com: Which of those cost components is the biggest contributor of the problem that companies are facing with costs of Big Data?
Fred Gallagher: The biggest budgetary challenge is scalability. Typically, IT folks start by analyzing data on their business or their customers in order to take informed action. They typically start with traditional relational databases, such as Oracle or Microsoft. These solutions usually require recurring subsequent investments of time and money that do not yield results. These solutions are not fit for the purpose of analytics.
Teradata or Netezza (acquired by IBM) offer larger solutions that scale to much bigger data than the traditional databases. But customers find that those solutions require large up-front purchases and certainly don’t scale down to an affordable entry point.
So companies looking to invest in software and server technology for Big Data face a major dilemma. Either they start small, with a traditional relational database and know that it will eventually require successive upgrades, or they make an incredibly expensive purchase — somewhere around $500,000 for 20-40 TBs — just to get started on the first project.
SandHill.com: Why is 20 TBs the starting point?
Fred Gallagher: 20 TBs is a cost-effective starting point for older data warehouse solutions. Disk drive capacities keep growing and these systems are outstripping the needs of many business projects. I actually think 20-40 TBs is the range where these traditional data solutions begin to make economic sense.
SandHill.com: What size are the companies in this range of 1- 40 TBs? Enterprises or SMBs?
Fred Gallagher: One of our customers is a division within a large company with $3 billion in annual revenue. They have a critical business problem — the need to better analyze their enterprise resource planning data. That data amounts to less than one TB in volume.
In fact, about 90 percent of the data warehouses that are being deployed are smaller than 10 TBs in size. And many business opportunities can be addressed with data projects that are less than 20 TBs.
Some companies collect log data, such as file data from users visiting a website, or aggregated data from smart residential electric meters. These kinds of data get quite large over time and can easily get up to over 100 TBs. Again, traditional data solutions become quite expensive for these business problems.
SandHill.com: So what can businesses do?
Fred Gallagher: Businesses are often forced to overspend time and money on large-scale data solutions, while also giving up flexibility. When a business needs to set up a 20-TB data solution by default, some companies decide to combine multiple projects and groups and divisions so they can spread the cost of the system. This wastes time. In other cases, businesses might overspend on their first solution, purchasing a product that vastly exceeds their needs.
It’s a really interesting phenomenon. Vendors are selling size, not flexibility. They aren’t addressing their customers’ business challenges and opportunities.
SandHill.com: You started this discussion by saying that Big Data solutions don’t have to cost big money, but it seems that especially the companies in the range of 1- 40 TBs are forced to spend more than necessary. Please explain how Actian helps companies facing this dilemma.
Fred Gallagher: The benefit of our Vectorwise solution is that it typically requires one-fifth of the hardware of other solutions on the market. So people can get started immediately with a cost-effective data project.
We’ve rewritten our software so that we take advantage of the processor — the CPU — which allows us to be much more efficient than other solutions on the market. All the other solutions on the market have built their scalability around memory and storage. When they hit the limit, they add another server.
Our solution scales with processor power. Vectorwise runs faster as people add more cores. Two years ago, it was common to have two cores per CPU. Now it’s pretty common that there would be at least six cores on a single CPU and maybe as many as 12. That’s three to six times growth!
SandHill.com: Does running faster equate to lower cost?
Fred Gallagher: What ends up happening when data performance slows down is companies buy more hardware to speed it up and achieve better performance. That gets expensive. Vectorwise is much faster and we have a much lower cost associated with achieving that faster performance.
SandHill.com: Is the problem with the cost of Big Data due to the fact that people don’t know about the Vectorwise solution? Are they unaware of where the cost comes in to improve solution performance?
Fred Gallagher: I’d answer yes to both of your questions. We are educating people about Vectorwise. There is also a need to educate or counsel customers on the costs and benefits of Big Data solutions.
Part of what is happening here is the advancement of standard technology. Years ago, Teradata was one of the first analytics solutions on the market. They built a good solution at that point in time, but chips and memory weren’t where they are today. Over time, technology improved; Intel and AMD invested billions of dollars in processors that provided exponentially more processing capability. So today there is a lot more technology available for a lot lower price and a much smaller footprint.
The large systems built in the 1980s were moving along, but the economics of standard CPUs changed more dramatically. Today’s relatively standard server accomplishes as much as a rack filled with the big systems that were designed in the 1980s and 1990s. If the customer ends up with is a lot more hardware, then there is a lot more cost associated with the bigger systems.
SandHill.com: Aren’t there other Big Data solutions in the market that are becoming more cost-effective?
Fred Gallagher: Right now there are a number of Big Data competitors in the market, but they don’t have a solution that’s comparable to ours. Some have modified their solutions and created lower-priced versions of their product. But they didn’t take the step we did, which was to rewrite the software to take advantage of the modern chip; so they still have to scale with servers instead of CPU cores.
SandHill.com: You advise companies to start with an affordable, smaller footprint. Is that just because of the starting-point cost?
Fred Gallagher: There are two other aspects I would highlight. The ability to get started with an affordable, smaller footprint also has an element of speed associated with it, meaning speed to deploy a project. This is key not only from a cost perspective but also from a perspective of the impact to the business.
It typically takes six months to well over a year to implement a data warehouse project, from purchasing equipment, building it and deploying it. In contrast, we can deploy Vectorwise in less time than other solutions, which has a big impact on business productivity and top-line improvement. Also, if it takes three months versus six months, you spend half the amount of people-time for deploying and maintaining the technology. People are one of the most expensive cost components.
SandHill.com: What if a company has already deployed another solution and then finds out about Vectorwise. What can they do?
Fred Gallagher: That’s fine. We look to the business problem the customer would like to solve. If they purchased another solution a few years ago and a new business problem has arisen, then it’s a good time to engage Actian in the discussion. It could be a good time to take some of the business users, understand their objective or problem and get started with an affordable pilot project using our solution.
I believe the decision around Big Data solutions is all about moving a company forward from an initial problem. For instance, if you’re a retailer, how do you increase your revenue, your sales? If you’re an Internet company, how do you keep users on the site?
You need to do the analytics so you can capture the customer’s attention when he or she is in a buying mode. You need to analyze the data to know the right offers to make to customers in order to satisfy their needs and increase your revenue. We help companies move forward and do it in a way that allows them to move faster while spending money efficiently.
Fred Gallagher is general manager of Vectorwise at Actian and is responsible for managing the business activities for this breakthrough product. Prior to joining Actian, Fred worked for Qlusters, where he was responsible for worldwide sales, marketing, and business development. Previously, he was at Seagate Technology, where he was vice president of worldwide channels and business development for Seagate’s XIOtech subsidiary.
Kathleen Goolsby is managing editor at SandHill.com.