Category Archives: Avro

What’s Next for Cloudera Impala?

Categories: Avro CDH Hadoop Impala

It’s been an exciting month and a half since the launch of the Cloudera Impala (the new open source distributed query engine for Apache Hadoop) beta, and we thought it’d be a great time to provide an update about what’s next for the project – including our product roadmap, release schedule and open-source plan.

First of all, we’d like to thank you for your enthusiasm and valuable beta feedback. We’re actively listening and have already fixed many of the bugs reported,

Read More

CDH3 update 5 is now available

Categories: Avro CDH Community Flume General Hadoop HBase HDFS Hive MapReduce Oozie Pig Sqoop ZooKeeper

We are happy to announce the general availability of CDH3 update 5. This update is a maintenance release of CDH3 platform and provides a considerable amount of bug-fixes and stability enhancements. Alongside these fixes, we have also included a few new features, most notable of which are the following:

  • Apache Flume 1.2.0 – Provides a durable file channel and many more features over the previous release.
  • Hive AvroSerDe – Replaces the Haivvreo SerDe and provides robust support for Avro data format.

Read More

Apache Flume Development Status Update

Categories: Avro Data Ingestion Flume General Hadoop HBase

Apache Flume is a scalable, reliable, fault-tolerant, distributed system designed to collect, transfer, and store massive amounts of event data into HDFS. Apache Flume recently graduated from the Apache Incubator as a Top Level Project at Apache. Flume is designed to send data over multiple hops from the initial source(s) to the final destination(s). Click here for details of the basic architecture of Flume. In this article, we will discuss in detail some new components in Flume 1.x (also known as Flume NG),

Read More

Apache Avro at RichRelevance

Categories: Avro Community Guest

This is a guest post from RichRelevance Principal Architect and Apache Avro PMC Chair Scott Carey.

In Early 2010 at RichRelevance, we were searching for a new way to store our long lived data that was compact, efficient, and maintainable over time. We had been using Hadoop for about a year, and started with the basics – text formats and SequenceFiles. Neither of these were sufficient. Text formats are not compact enough,

Read More

Apache Flume – Architecture of Flume NG

Categories: Avro Community Data Ingestion Flume General Hadoop

This blog was originally posted on the Apache Blog: https://blogs.apache.org/flume/entry/flume_ng_architecture

Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. Flume is currently undergoing incubation at The Apache Software Foundation. More information on this project can be found at http://incubator.apache.org/flume. Flume NG is work related to new major revision of Flume and is the subject of this post.

Read More