CopyTable is a simple Apache HBase utility that, unsurprisingly, can be used for copying individual tables within an HBase cluster or from one HBase cluster to another. In this blog post, we’ll talk about what this tool is, why you would want to use it, how to use it, and some common configuration caveats.
CopyTable is at its core an Apache Hadoop MapReduce job that uses the standard HBase Scan read-path interface to read records from an individual table and writes them to another table (possibly on a separate cluster) using the standard HBase Put write-path interface.
Today the Apache HBase community has proudly released Apache HBase 0.92.0, a major new version of the scalable distributed data store inspired by Google’s BigTable. Over 670 issues were addressed, so in this post I’ll highlight some of the major features and enhancements and describe what they mean for HBase users, admins, and developers.
While the most visible change to the project is the new project logo, the most important changes for users are the performance and robustness improvements to HBase’s core functionality. On the performance side,
Apache HBase 0.90.5 is now available. This release of the scalable distributed data store inspired by Google’s BigTable is a fix release that covers 81 issues, including 5 considered blockers, and 11 considered critical. The release addresses several robustness and resource leakage issues, fixes rare data-loss scenarios having to do with splits and replication, and improves the atomicity of bulk loads. This version includes some new supporting features including improvements to hbck and an offline meta-rebuild disaster recovery mechanism.
San Francisco, Salesforce.com HQ – Recently there was an Apache HBase Pow-wow where project contributors gathered to discuss the directions of future releases of HBase in person. This group included a quorum of the core committers from Facebook, StumbleUpon, Salesforce, eBay, and Cloudera as well as many contributors and users from other companies. This was an open discussion, and in compliance with Apache Software Foundation policies, the agenda and detailed minutes were shared with the community at large so that everyone can chime in before any final decisions are made.
On Monday, we held our second Flume Office Hours at Cloudera HQ in Palo Alto. The intent was to meet informally, to talk about what’s new, to answer questions, and to get feedback from the community to help prioritize features for future releases.
Below is the slide deck from Flume Office Hours:
This time we had an online presense for folks to participate from remote locations.