Skip to main content

HBase: The NoSQL Data Store for Big Data

By April 8, 2014Article

HBase started as a homegrown project of some developers to implement, as open source software, the technology described by Google in a 2006 paper. This technology, called Bigtable, which is still widely used at Google, is a distributed storage system that scales to petabytes of data across thousands of servers. HBase tables can have billions of rows and millions of columns.

HBase’s original developers saw a lot of potential use cases for this capability in the context of the Apache Hadoop ecosystem. You could say that HBase is a “persistence layer” for Hadoop that enables low-latency random read and write access to Hadoop data, which is needed by Web applications. Eventually, after a short period of initial development, HBase entered the Apache incubator; and the rest is history.

Today, Apache HBase is used in the production systems of many very large, and very data-rich, companies. HBase is often said in the same breath as “MongoDB” and “Cassandra.” Although it is not as versatile as those alternatives, HBase does have the advantage, in Big Data environments, of full integration with Hadoop. For that reason, the popularity arc of HBase closely mirrors that of Hadoop itself. And in terms of open positions in IT, the demand for HBase-related experience ranks right at the top along with other NoSQL technologies.

In many ways, HBase is a classic open source success story. The developer community behind it is among the most diverse and active in existence, with the quality and quantity of contributions spiking sharply in the last couple of years. Cloudera employees, eight of whom are HBase committers, are deeply involved; but so are the employees of user companies like Facebook, Intel, Hortonworks and Salesforce.com. Cloudera is a principal commercial force behind HBase and was the first vendor to offer enterprise support for it, in 2010.

In 2011, Cloudera’s leadership realized that what was good for the HBase community would also be good for the Hadoop ecosystem. So, why not give that community a boost by hosting and organizing a conference, where contributors and users could meet, share war stories and exchange ideas? Hence, HBaseCon was born. The conference launched in 2012 and, in 2013, attendance almost doubled.

HBaseCon is back in 2014, taking place on May 5 at The Hilton San Francisco Union Square. The conference continues its tradition of excellence, with 30+ technical sessions led by practitioners on operations, features, ecosystem projects and case studies. Furthermore, it includes a series of keynotes from remarkable engineers, including the first public update in several years about Bigtable use cases at Google. Facebook and Salesforce.com folks are represented in the general session to talk about their respective use cases as well.

HBaseCon 2014 is approaching quickly. Register here. 

Eli Collins is chief technologist at Cloudera. He spent the previous four years leading the team responsible for Cloudera’s Hadoop distribution (CDH) and is an Apache Hadoop committer and PMC member. Prior to Cloudera, Eli worked on processor virtualization and Linux at VMware. You can find him on Twitter at @elicollins.