For several good reasons, 2013 is a Happy New Year for Apache Hadoop enthusiasts.
In 2012, we saw continued progress on developing the next generation of the MapReduce processing framework (MRv2), work that will bear fruit this year. HDFS experienced major progress toward becoming a lights-out, fully enterprise-ready distributed filesystem with the addition of high availability features and increased performance. And a hint of the future of the Hadoop platform was provided with the Beta release of Cloudera Impala,
It’s been an exciting month and a half since the launch of the Cloudera Impala (the new open source distributed query engine for Apache Hadoop) beta, and we thought it’d be a great time to provide an update about what’s next for the project – including our product roadmap, release schedule and open-source plan.
First of all, we’d like to thank you for your enthusiasm and valuable beta feedback. We’re actively listening and have already fixed many of the bugs reported,
We are happy to announce the general availability of CDH3 update 5. This update is a maintenance release of CDH3 platform and provides a considerable amount of bug-fixes and stability enhancements. Alongside these fixes, we have also included a few new features, most notable of which are the following:
- Apache Flume 1.2.0 – Provides a durable file channel and many more features over the previous release.
- Hive AvroSerDe – Replaces the Haivvreo SerDe and provides robust support for Avro data format.
Apache Flume is a scalable, reliable, fault-tolerant, distributed system designed to collect, transfer, and store massive amounts of event data into HDFS. Apache Flume recently graduated from the Apache Incubator as a Top Level Project at Apache. Flume is designed to send data over multiple hops from the initial source(s) to the final destination(s). Click here for details of the basic architecture of Flume. In this article, we will discuss in detail some new components in Flume 1.x (also known as Flume NG),
This is a guest post from RichRelevance Principal Architect and Apache Avro PMC Chair Scott Carey.
In Early 2010 at RichRelevance, we were searching for a new way to store our long lived data that was compact, efficient, and maintainable over time. We had been using Hadoop for about a year, and started with the basics text formats and SequenceFiles. Neither of these were sufficient. Text formats are not compact enough,