Category Archives: ZooKeeper

Apache Hadoop in 2013: The State of the Platform

Categories: Avro CDH Flume Hadoop HBase HDFS Hive Hue Impala Mahout MapReduce Oozie Pig Sqoop YARN ZooKeeper

For several good reasons, 2013 is a Happy New Year for Apache Hadoop enthusiasts.

In 2012, we saw continued progress on developing the next generation of the MapReduce processing framework (MRv2), work that will bear fruit this year. HDFS experienced major progress toward becoming a lights-out, fully enterprise-ready distributed filesystem with the addition of high availability features and increased performance. And a hint of the future of the Hadoop platform was provided with the Beta release of Cloudera Impala,

Read more

Apache HBase AssignmentManager Improvements

Categories: Community HBase ZooKeeper

AssignmentManager is a module in the Apache HBase Master that manages regions to RegionServers assignment. (See HBase architecture for more information.) It ensures that all regions are assigned and each region is assigned to just one RegionServer.

Although the AssignmentManager generally does a good job, the existing implementation does not handle assignments as well as it could. For example, if a region was assigned to two or more RegionServers,

Read more

Hadoop/HBase Capacity Planning

Categories: Hadoop HBase HDFS MapReduce ZooKeeper

Apache Hadoop and Apache HBase are gaining popularity due to their flexibility and tremendous work that has been done to simplify their installation and use.  This blog is to provide guidance in sizing your first Hadoop/HBase cluster.  First, there are significant differences in Hadoop and HBase usage.  Hadoop MapReduce is primarily an analytic tool to run analytic and data extraction queries over all of your data, or at least a significant portion of them (data is a plural of datum).

Read more

Migrating to CDH

Categories: General Hadoop HBase HDFS Hive MapReduce Pig ZooKeeper

With the recent release of CDH3b2, many users are more interested than ever to try out Cloudera’s Distribution for Hadoop (CDH). One of the questions we often hear is, “what does it take to migrate?”.

Why Migrate?

If you’re not familiar with CDH3b2, here’s what you need to know.

All versions of CDH provide:

  • RPM and Debian packages for simple installation and management.
  • Clean integration with the host operating system.

Read more

Building a distributed concurrent queue with Apache ZooKeeper

Categories: ZooKeeper

In my first few weeks here at Cloudera, I’ve been tasked with helping out with the Apache ZooKeeper system, part of the umbrella Hadoop project. ZooKeeper is a system for coordinating distributed processes. In a distributed environment, getting processes to act in any kind of synchrony is an extremely hard problem. For example, simply having a set of processes wait until they’ve all reached the same point in their execution –

Read more