Category Archives: Avro

Converting Apache Avro Data to Parquet Format in Apache Hadoop

Categories: Avro Guest Hadoop Parquet

Thanks to Big Data Solutions Architect Matthieu Lieber for allowing us to republish the post below.

A customer of mine wants to take advantage of both worlds: work with his existing Apache Avro data, with all of the advantages that it confers, but take advantage of the predicate push-down features that Parquet provides. How to reconcile the two?

For more information about combining these formats,

Read More

Meet the Project Founder: Doug Cutting (First in a Series)

Categories: Avro Community Hadoop Meet the Engineer

ToddAt Cloudera, there is a long and proud tradition of employees creating new open source projects intended to help fill gaps in platform functionality (in addition to hiring new employees who have done so in the past). In fact, more than a dozen ecosystem projects — including Apache Hadoop itself — were founded by Clouderans, more than can be attributed to employees of any other single company. Cloudera was also the first vendor to ship most of those projects as enterprise-ready bits inside its platform.

Read More

Apache Hadoop in 2013: The State of the Platform

Categories: Avro CDH Flume Hadoop HBase HDFS Hive Hue Impala Mahout MapReduce Oozie Pig Sqoop YARN ZooKeeper

For several good reasons, 2013 is a Happy New Year for Apache Hadoop enthusiasts.

In 2012, we saw continued progress on developing the next generation of the MapReduce processing framework (MRv2), work that will bear fruit this year. HDFS experienced major progress toward becoming a lights-out, fully enterprise-ready distributed filesystem with the addition of high availability features and increased performance. And a hint of the future of the Hadoop platform was provided with the Beta release of Cloudera Impala,

Read More

What’s Next for Cloudera Impala?

Categories: Avro CDH Hadoop Impala

It’s been an exciting month and a half since the launch of the Cloudera Impala (the new open source distributed query engine for Apache Hadoop) beta, and we thought it’d be a great time to provide an update about what’s next for the project – including our product roadmap, release schedule and open-source plan.

First of all, we’d like to thank you for your enthusiasm and valuable beta feedback. We’re actively listening and have already fixed many of the bugs reported,

Read More

CDH3 update 5 is now available

Categories: Avro CDH Community Flume General Hadoop HBase HDFS Hive MapReduce Oozie Pig Sqoop ZooKeeper

We are happy to announce the general availability of CDH3 update 5. This update is a maintenance release of CDH3 platform and provides a considerable amount of bug-fixes and stability enhancements. Alongside these fixes, we have also included a few new features, most notable of which are the following:

  • Apache Flume 1.2.0 – Provides a durable file channel and many more features over the previous release.
  • Hive AvroSerDe – Replaces the Haivvreo SerDe and provides robust support for Avro data format.

Read More