Happy Birthday Apache HBase! 10 years of resilience, stability, and performance

Apache HBase became a top-level project with Apache 10 years ago and Cloudera began contributing to it at the same time (2010).  Over this time, it has become one of the largest and most popular open-source tools in big data and one of the most popular NoSQL databases.

The Apache Software Foundation Announces the 10th Anniversary of Apache HBase

HBase supports both key-value and wide-column NoSQL database and is used by enterprises far and wide.  Cloudera has over 500 customers in production using it for use cases ranging from mission-critical transactional applications, data warehousing, machine learning, and data engineering.  Our customers choose HBase because of its resilience (with some customers able to realize 100% application uptime over many years), stability, performance, and low operational cost.  Cloudera customers deploy it stand-alone, along with Phoenix which is an SQL based database built on HBase and sometimes with Apache Impala and/or Apache Hive which allows them to run SQL based OLAP queries on HBase.

I have been the Product Manager for Cloudera’s Operational Database offering since 2018 and had the opportunity to meet with many of our customers.  I am continually impressed by the wide range of ways customers use HBase.  The breadth of use cases is so large and varied that it defies segmentation.  After much analysis, I ended up with a simple approach to classify use cases — customers that use it to support mission-critical applications and those that don’t.  The mission-critical applications tend to be transactional in nature and help our customers drive their top-line revenue and/or drive operational efficiencies.  For them, if HBase goes down the top-line and/or bottom-line is impacted and, in the worst case, people can die. 

Examples of Mission-Critical use cases :

  • A health care software vendor uses HBase to power hundreds of applications.  If these applications fail, for any reason people can die and health care costs go up.  This customer has deployed HBase on 7,000+ nodes with over 70PB of data.
  • A mobile phone manufacturer uses HBase to enable a voice assistant and many other use cases on 6,000+ nodes
  • A financial media house uses HBase to power parts of the platform and enables traders and others to understand the relevant context around stock price movements, trends, etc on 1,200+ nodes
  • A market-leading email marketing platform runs HBase on ~1,000 nodes
  • An insurance provider uses HBase on ~1,000 nodes to store all claim information and uses it for managing those claims throughout their life cycle
  • A library services provider uses HBase on 400+ nodes to support inter-library loans around the world 
  • A global power distribution company uses HBase on 400+ nodes to ingest readings from 7+ million smart meters and to perform automated deployment of repair teams for the electrical distribution network, power billing applications and drive continuous training of machine learning models 
  • The largest Indonesian telco, Telkomsel, with over 170 million customers, migrated their entire CRM application from legacy MPP database to HBase and Impala and was able to achieve sub-sec response time on all CRM queries for individual users call records, profiles, recharges, data usage, etc. The benefit of having Impala to query HBase was to ensure ANSI SQL compatible interface accessible via JDBC to minimize CRM changes.

Examples of Non-Mission-critical use cases:

  • A manufacturer of personal care products uses HBase to manage all of their product brand and marketing materials 
  • A semiconductor manufacturer uses HBase to store log files from their products and extracts them to other systems for analytics 
  • A telecommunication provider uses HBase to store their dimension tables for Hive

What sets HBase apart from other NoSQL offerings is its integration across the Open Source, Big Data Ecosystem which enables customers to have an end-to-end experience.  They can use it for applications that need data from the edge or applications that need to deliver AI/ML models at scale or any combination thereof.  

One of the most interesting support tickets I have seen come across at Cloudera is when an HBase customer filed a high priority ticket indicating their mission-critical deployment was down.  They hadn’t interacted with us for over a year and I didn’t even know they were an important customer.  It was only, in this case, did I learn that they had deployed 1,000 nodes to power an omnichannel marketing platform on HBase.  The root of the problem was that they had made some problematic changes to their configuration settings 9 months prior to the incident.  When they finally rebooted, the problematic config settings took effect causing them to ask Cloudera for help!

Cloudera cares deeply about HBase and has 15 committers & PMC members on the project.  We are also investing to make it available on public cloud, with both PaaS-like and dbPaaS form-factors.

HBase Experiences Through the Years

Given our long-standing commitment and history with this project, we wanted to share a couple of experiences and stories associated with this project from across the Cloudera team.

“Years ago, I was attending an Apache Hadoop focused technical conference. Late one evening, I was walking back to my room, and I happened to see a group of individuals who I recognized as long-time customers huddled around a table. Now, these are a very competent group of individuals who I had worked with already for many years. I meandered over, intending to briefly say hello and be on my way after a long day. It turned out, they were having a production outage on one of their systems and were in the middle of trying to get it resolved. I sat down, pulled out my laptop, and hung out with them for the next few hours while we analyzed the problem and addressed the issues we found. Supporting mission-critical applications sometimes require heroics, but sometimes you also find some birds of a feather along the way.”

— Senior Engineer

“In the nature of many businesses, it is absolutely vital to be able to scale and still fulfill the low latency requirements of their mission-critical system. If you look back through the archives, our customers had tough times to live up to such difficult standards. HBase has the elements that made it look easy to meet those expectations, especially, by minimizing the time to trigger the next best action.”

Principal Solutions Architect 

“Three years ago I was a new joiner Engineering Manager at Cloudera. I used to know about the company’s open-source activity and I have been a GNU Linux user since high school, but using open source and being part of it are completely different.

As the new guy at the company, I had to understand what the team does so I got a few support tickets assigned to myself and started working on them. I only knew two things, I was a Java developer for many years so I must be able to do it and Hortonworks is our most challenging competitor which means it might be interesting to work with them.

And then it happened, with my first ever Apache HBase ticket I ran into Josh Elser – lead of Hortonworks’ HBase team – who showed me that implementing a newby task can be harder than expected (with the quality bars the HBase team has) and that your competitor can be your best partner in the open-source community. In the end, he committed my changes.

Over the last three years, many things have changed. Cloudera and Hortonworks merged, we now work at the same company but Apache and HBase are the same. I have limited time to work on the code but see its power, see how it is used for services I didn’t know existed and I see how it gets people all over the world working together. It connects people over companies, continents, cultures.”

— Engineering Manager

“HBase and Phoenix have been easy to learn.  Data Hub makes it easy to start and now looking forward to Cloudera Operational Database taking HBase to the next decade.”

— Technical Customer Success Manager

“Over the last 9 years, I have been from the frontlines to the development of HBase and seen the evolution of how our customers use HBase from a POC to large scale, mission-critical platforms. The most remarkable moment in this time was before the Cloudera and Hortonworks merger when teams of both companies worked together to improve the functionality of one critical product feature. Eventually, the work was presented at HBaseCon and received great recognition by two of HBase’s largest users. This feature, powers a critical functionality used by more than 2B mobile devices around the world”.

– Senior Engineer 

“Another great moment from the last 10 years of HBase, was at HBaseCon 2015 when Carter Page from Google gave public recognition to how HBase evolved into a very solid project” 

– Senior Engineer

“I’ve had the pleasure of attending nearly all of the HBaseCons (and speaking at some of them).  These are three of my favorite HBaseCon memories: (1) HBase’s diversity: the HBase 2.0 announcement highlighted not just the number of JIRAs shipped but also the number of HBase committers and PMC members from outside of the US plus had a woman leading the HBase PMC, (2) HBase’s momentum: Facebook’s announcement that they were leaving their custom fork to go 100% upstream Apache HBase, and (3) HBase’s stepping-stones: at one HBaseCon, a Bloomberg developer gave a read replicas talk and then two HBaseCons later, an Apple developer gave an HBaseCon keynote on using read replicas in production.”

– Senior Systems Engineer

At Cloudera, we continue to see a bright future for this project and expect it to evolve to power next-gen applications being built in the cloud in PaaS-like and dbPaaS form factors as well as in datacenter with private cloud.  

For a preview of what’s to come, check out CDP Public Cloud’s Operational DB template.

Krishna Maheshwari
Krishna Maheshwari

Director of Product Management

Leave a comment

Your email address will not be published. Links are not permitted in comments.