Author Archives: Jonathan Hsieh

Online Apache HBase Backups with CopyTable

Categories: General HBase

CopyTable is a simple Apache HBase utility that, unsurprisingly, can be used for copying individual tables within an HBase cluster or from one HBase cluster to another. In this blog post, we’ll talk about what this tool is, why you would want to use it, how to use it, and some common configuration caveats.

Use cases:

CopyTable is at its core an Apache Hadoop MapReduce job that uses the standard HBase Scan read-path interface to read records from an individual table and writes them to another table (possibly on a separate cluster) using the standard HBase Put write-path interface.

Read more

Apache HBase 0.92.0 has been released

Categories: General HBase

Today the Apache HBase community has proudly released Apache HBase 0.92.0, a major new version of the scalable distributed data store inspired by Google’s BigTable.  Over 670 issues were addressed, so in this post I’ll highlight some of the major features and enhancements and describe what they mean for HBase users, admins, and developers.

User Features

While the most visible change to the project is the new project logo, the most important changes for users are the performance and robustness improvements to HBase’s core functionality. On the performance side,

Read more

Apache HBase 0.90.5 is now available

Categories: General HBase

Apache HBase 0.90.5 is now available.  This release of the scalable distributed data store inspired by Google’s BigTable is a fix release that covers 81 issues, including 5 considered blockers, and 11 considered critical.  The release addresses several robustness and resource leakage issues, fixes rare data-loss scenarios having to do with splits and replication, and improves the atomicity of bulk loads.  This version includes some new supporting features including improvements to hbck and an offline meta-rebuild disaster recovery mechanism.

Read more

Apache HBase Pow-wow Summary 11/29/2011

Categories: Community General HBase

San Francisco, HQ – Recently there was an Apache HBase Pow-wow where project contributors gathered to discuss the directions of future releases of HBase in person.   This group included a quorum of the core committers from Facebook, StumbleUpon, Salesforce, eBay, and Cloudera as well as many contributors and users from other companies.  This was an open discussion, and in compliance with Apache Software Foundation policies, the agenda and  detailed minutes were shared with the community at large so that everyone can chime in before any final decisions are made.

Read more

Flume Community Office Hours @ Cloudera HQ, 2/28/2011

Categories: CDH Community Flume

On Monday, we held our second Flume Office Hours at Cloudera HQ in Palo Alto.  The intent was to meet informally, to talk about what’s new, to answer questions, and to get feedback from the community to help prioritize features for future releases.

Below is the slide deck from Flume Office Hours:

This time we had an online presense for folks to participate from remote locations.

Read more