Tag Archives: Hadoop Summit

How-to: Tune MapReduce Parallelism in Apache Pig Jobs

Categories: Guest How-to Pig

Thanks to Wuheng Luo, a Hadoop and big data architect at Sears Holdings, for the guest post below about Pig job-level performance tuning

Many factors can affect Apache Pig job performance in Apache Hadoop, including hardware, network I/O, cluster settings, code logic, and algorithm. Although the sysadmin team is responsible for monitoring many of these factors, there are other issues that MapReduce job owners or data application developers can help diagnose,

Read more

Apache Hadoop in 2013: The State of the Platform

Categories: Avro CDH Flume Hadoop HBase HDFS Hive Hue Impala Mahout MapReduce Oozie Pig Sqoop YARN ZooKeeper

For several good reasons, 2013 is a Happy New Year for Apache Hadoop enthusiasts.

In 2012, we saw continued progress on developing the next generation of the MapReduce processing framework (MRv2), work that will bear fruit this year. HDFS experienced major progress toward becoming a lights-out, fully enterprise-ready distributed filesystem with the addition of high availability features and increased performance. And a hint of the future of the Hadoop platform was provided with the Beta release of Cloudera Impala,

Read more

The Apache Hadoop Ecosystem, Visualized in Datameer

Categories: Community General Guest Hadoop

This is a guest re-post from Datameer’s Director of Marketing, Rich Taylor. The original post can be found on the Datameer blog.

Datameer uses D3.js to power our Business Infographicâ„¢ designer. I thought I would show how we visualized the Apache Hadoop ecosystem connections. First using only D3.js, and second using Datameer 2.0.

Many people asked about the image above that was on our booth at the Hadoop Summit.

Read more

What’s New in CDH3b2: Flume

Categories: General

As part of our series of announcements at the recent Hadoop Summit, Cloudera released two of its previously internal projects into open source. One of those was the HUE user interface environment, which we’ll be saying a bit more about later this week. The other was our data movement platform Flume. We’ve been working on Flume for many months, and it’s really exciting to be able to share the details of what we’ve been doing.

Read more