Apache HBase will have a notable profile at ApacheCon Europenext month. Clouderan and HBase committer Lars George has two sessions on the schedule:
- HBase Sizing and Schema Design
Abstract: This talk will guide the HBase novice to consider important details when designing HBase backed storage systems. Examples of schemas are given and discussed, as well as rules of thumb that will help to avoid common traps. With the right knowledge of how HBase works internally, it will be much easier to come to terms with performance implications of different data schemas.
- HBase Status Quo
Abstract: This talk focuses on what happened to HBase since version 0.90. The idea is to introduce and discuss all the major changes in 0.92, 0.94, and trunk, aka 0.96. This spans from coprocessors and security, to distributed log splitting in 0.92, to prefix compression and lazy seek optimizations in 0.94 and so on. But also more subtle – yet often equally important – features like WALPlayer, or the handling of checksums, are presented as they improve operations and performance. The goal is to show the audience the large strides HBase and its community have taken towards a 1.0 release.
HBase user Christian Gügi of Sentric in Zurich, who is also an organizer of the Swiss Big Data User Group, has a session as well:
- Operating HBase: Things You Need to Know
Abstract: If you’re running HBase in production, you have to be aware of many things. In this talk we will share our experience in running and operating an HBase production cluster for a customer. To avoid common pitfalls, we’ll discuss problems and challenges we’ve faced as well as practical solutions (real-world techniques) for repair. Even though HBase provides internal tools for diagnosing issues and for repair, running a healthy cluster can still be challenging for an administrator. We’ll cover some background on these tools as well as on HBase internals such as compaction, region splits and their distribution.
Finally, Steve Watt of HP offers some learning from his company’s experiences with respect to HBase:
- Taking the Guesswork Out of Your Hadoop Infrastructure
Abstract: Apache Hadoop is clearly one of the fastest growing big data platforms to store and analyze arbitrarily structured data in search of business insights. However, applicable commodity infrastructures have advanced greatly in the last number of years and there is not a lot of information to assist the community in optimally designing and configuring Hadoop Infrastructure based on specific requirements. For example, how many disks and controllers should you use? Should you buy processors with 4 or 6 cores? Do you need a 1GbE or 10GbE Network? Should you use SATA or MDL SAS? Small or Large Form Factor Disks? How much memory do you need? How do you characterize your Hadoop workloads to figure out whether your are I/O, CPU, Network or Memory bound? In this talk we’ll discuss the lessons learned and outcomes from the work HP has done to optimally design and configure infrastructure for both MapReduce and HBase.
So, if you happen to be in Sinsheim, Germany, during the first week of November, you could do worse with your free time than brush up on your HBase knowledge at ApacheCon!