HBaseCon 2012: A Glimpse into the Operations Track
HBaseCon 2012 is only a month away! The conference takes place May 22 in San Francisco, California and the event is poised to sell out.
For those unfamiliar with the Apache HBase project, HBase is open source software that allows for real-time random read/write access to your Big Data in Apache Hadoop with very low latency and high scalability. Presentations in the HBaseCon 2012 Operations track will explain the state of HBase today, how to mitigate HBase failures, and best practices in cluster deployment and cluster monitoring.
Operations Track Presentations
At Facebook we have demanding HBase installations which are used for important and real-time user activity, so failure in an HBase cluster can be a serious issue requiring immediate attention. This session will discuss a variety of real-world scenarios where we have had failures in our HBase systems, how our Operations and Engineering teams have worked to mitigate many of these issues, and where HBase still needs to improve instead of relying on workarounds. The database should never go down. This talk is aimed at developers and other users of HBase (both current and potential) who are interested in an operational perspective on the state of HBase today.
Reliable backup and recovery is one of the main requirements for any enterprise grade applications. HBase has been very well embraced by enterprises needing random, real-time read/write access with huge volumes of data and ease of scalability. As such they are looking for backup solutions that are reliable, easy to use, and can work with existing infrastructure. HBase comes with several backup options but there is a clear need to improve the native export mechanisms. This talk will cover various options that are available out of the box, their drawbacks and what various companies are doing to make backup and recovery efficient. In particular it will cover what Facebook has done to improve performance of backup and recovery process with minimal impact to production cluster.
Trend Micro developed the new security features in HBase 0.92 and has the first known deployment of secure HBase in production. We will share our motivations, use cases, experiences, and provide a 10 minute tutorial on how to set up a test secure HBase cluster and a walk through of a simple usage example. The tutorial will be carried out live on an on-demand EC2 cluster, with a video backup in case of network or EC2 unavailability.
As small companies are adapting to handle Big Data, the cloud and HBase enable developers to leverage that data to provide revenue generating real-time applications. When developing a real-time application for an existing system, one must balance incrementing counters in real-time with MapReduce jobs over the same data-set. When maintaining an analytics platform, ensuring data accuracy is essential. At Sproxil, SMS logs are ingested into HBase at a growing rate and we report metrics such as SMS throughput, unique user growth over time, and return SMS user activity in real time. Sproxil provides a versatile analytics application enabling customers to handpick statistics on demand to gain market insights enabling them to react quickly to trends. This talk will identify the most profitable metrics and demonstrate how to calculate them using Map Reduce while continually updating data as it arrives.
Determining the number of unique users that have interacted with a web page, game, or application is a very common use case. HBase is becoming an increasingly accepted tool for calculating sets or counts of unique individuals who meet some criteria. Computing these statistics can range in difficulty from very simple to very difficult. This session will explore how different approaches have worked or not worked at scale for counting uniques on HBase with Hadoop.
This session will discuss how you can represent your complete cluster with one config file and have it deployed to Cloud or Bare Metal. Infochmimps’ Ironfan builds on Opscode Chef to allow you to specify and orchestrate all flavors of your cluster’s deployment, monitoring and growth. Not just the core HBase/HDFS/MapReduce/Hive/Flume, etc. but all the elements including web / app servers, mysql, redis, rabbitmq and whatever other servers needed to implement your service. These same tools can manage variations for development, staging, R&D as well as the target “rendering” to various Clouds, Bare Metal or even Vagrant VMs.