Author Archives: Arvind Prabhakar

CDH3 update 5 is now available

Categories: Avro CDH Community Flume General Hadoop HBase HDFS Hive MapReduce Oozie Pig Sqoop ZooKeeper

We are happy to announce the general availability of CDH3 update 5. This update is a maintenance release of CDH3 platform and provides a considerable amount of bug-fixes and stability enhancements. Alongside these fixes, we have also included a few new features, most notable of which are the following:

  • Apache Flume 1.2.0 – Provides a durable file channel and many more features over the previous release.
  • Hive AvroSerDe – Replaces the Haivvreo SerDe and provides robust support for Avro data format.

Read More

Apache Flume – Architecture of Flume NG

Categories: Avro Community Data Ingestion Flume General Hadoop

This blog was originally posted on the Apache Blog:

Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. Flume is currently undergoing incubation at The Apache Software Foundation. More information on this project can be found at Flume NG is work related to new major revision of Flume and is the subject of this post.

Read More

What’s New in CDH3b2: Oozie

Categories: General Hadoop HDFS Hive MapReduce Pig

Hadoop has emerged as an indispensable component of any data-intensive enterprise infrastructure.  In many ways, working with large datasets on a distributed computing platform (powered by commodity hardware or cloud infrastructure) has never been easier. But because customers are running clusters consisting of hundreds or thousands of nodes, and are processing massive quantities of data from production systems every hour, the logistics of efficient platform utilization can quickly become overwhelming.

To deal with this challenge,

Read More