Big data is becoming much more than just widespread distribution of cheap storage and cheap computation on commodity hardware. Big data analytics may soon become the new “killer app” for high performance computing (HPC).
There is more to big data than large amounts of information. It also pertains to massive distributed activities such as complex queries and computations (a.k.a analytics). In other words, deriving value through computation is just as “big” as the size of the data sets themselves. In fact, big data on HPC has already been coined by the analyst firm IDC as “High Performance Data Analysis.”
HPC is well positioned to enable big data use cases through all three phases of typical workflows including data capture and filtering, analytics and results visualization. In addition to the three phases, the speed of computation matters just as much as the scale. In order to unlock the full potential of big data, we have to pair it with “big compute,” or HPC.
Here are three ways big data and HPC are converging and how businesses can take full advantage of the phenomenon right now to improve large-scale processing.
1. Hadoop meets InfiniBand
Many consider InfiniBand, the most commonly used interconnect technology in supercomputers, just as basic of a requirement for HPC as bare metal processing. If you can’t move information back and forth between nodes quickly, it limits the horizontal scalability you can achieve. RDMA for Apache Hadoop provides an excellent high speed, low latency interconnect option for big data platforms. You can even provision a Hadoop cluster in the cloud that leverages RDMA in no time. Consider that 56Gbps FDR InfiniBand can be over 100 times faster than even 10Gbps Ethernet due to its superior bandwidth and latency advantage. Short of using very expensive custom bus fabrics, this is the fastest way to distribute data and processing across computational nodes.
Finally, you can scale that big data platform to the size it deserves without worrying nearly as much about bottlenecks. Not only would you obtain results faster, but the setup time would be far lower than if using commodity networking technology.
2. Hadoop meets accelerators
Another key feature of HPC is the use of popular co-processors and accelerators such as passively cooled NVIDIA Tesla and Kepler GPUs. Just as these technologies greatly assist technical computing solutions, they can also help big data and analytics much like they already do for sequencing and alignment.
Hadoop leveraging GPU technologies such as CUDA and OpenCL can boost big data performance by a significant factor. All other things being equal, higher-performance big data platforms and technologies such as Hadoop, Spark and MapReduce lead to faster results for complex analytics.
In fact, the only way to keep up with the growing amount of data we are collecting is to increase computation speed at the same time. Big data leveraging co-processors and accelerators is an important way for HPC to make a big impact in this space.
3. Big data and HPC converge in the cloud
As big data fuels public cloud growth faster than any other application, HPC on demand is an emerging force ready to meet this challenge. The more data we collect, the more computational capacity we need to analyze the data.
Simply stated, big data and HPC growth in the cloud go hand in hand. The only way to provide enough scale to keep up with demand is to deploy HPC class assets to increase processing performance and density.
Thanks to the marriage of big data platforms with supercomputing technologies such as high-speed interconnects and co-processors, organizations can utilize and deploy HPC on-demand services designed to enable the next major wave of analytics innovation. The same computational power that accelerates sequencing and alignment today can vastly improve queries and comparisons in the future.
With distributed file systems such as Hadoop rather than expensive, traditional HPC parallel storage, the economics become more attractive.
Finally, with the time to value and elastic scale only possible in the public cloud, companies can now focus exclusively on their work rather than wrestling with IT platforms.
Thanks to the convergence of big data and HPC on demand, companies can leverage the scale and availability of computation in the public cloud.
Leo Reiter is the chief technology officer at Nimbix, a leader in cloud-based, high-performance computing (HPC) infrastructure and application. He is a virtualization and cloud computing pioneer with over 20 years of experience in software development and technology strategy. Prior to Nimbix, Leo was co-founder and CTO of Virtual Bridges. He is an entrepreneur with a strong background in LeanStartup and Agile methodologies.