Advice on QA Testing Your MapReduce Jobs

Categories: MapReduce Testing

As Hadoop adoption increases among organizations, companies, and individuals, and as it makes its way into production, testing MapReduce (MR) jobs becomes more and more important. By regularly running tests on your MR jobs–either invoked by developers before they commit a change or by a continuous integration server such as hudson–an engineering organization can catch bugs early, strive for quality, and make developing and maintaining MR jobs easier and faster.

MR jobs are particularly difficult to test thoroughly because they run in a distributed environment. 

Read more

Running the Cloudera Training VM in VirtualBox

Categories: General Guest Hadoop Training

Update (May 1 2013): The post below, which is based on an outdated VM, is deprecated. Rather please see the Cloudera QuickStart VM, which runs on VirtualBox, VMware, and KVM.

Cloudera’s Training VM is one of the most popular resources on our website. It was created with VMware Workstation, and plays nicely with the VMware Player for Windows, Linux, and Mac. But VMware isn’t for everyone. Thomas Lockney has managed to get our VM image running on Virtual Box,

Read more

Apache Hadoop HA Configuration

Categories: General Guest Hadoop HDFS

Disclaimer: Cloudera no longer approves of the recommendations in this post. Please see this documentation for configuration recommendations.

One of the things we get a lot of questions about is how to make Hadoop highly available. There is still a lot of work to be done on this front, but we wanted to take a moment and share the best practices from one of our customers. Check out what Paul George has to say about how they keep thier NameNode up at ContextWeb.

Read more

The Project Split

Categories: Community General Hadoop HDFS MapReduce

Last Wednesday, we hosted a Hadoop meetup, and I gave a short talk about the new project split. How does the split change the project’s organization, and what does it mean for end users?

The mailing lists and the source code repositories have been rearranged. For those doing development against Hadoop’s “trunk” branch, compiling Hadoop and using the various components in concert has become more complicated.

My presentation slides cover which mailing lists to subscribe to,

Read more

File Appends in HDFS

Categories: General Hadoop HDFS

There is some confusion about the state of the file append operation in HDFS. It was in, now it’s out. Why was it removed, and when will it be reinstated? This post looks at some of the history behind HDFS capability for supporting file appends.

Background

Early versions of HDFS had no support for an append operation. Once a file was closed, it was immutable and could only be changed by writing a new copy with a different filename.

Read more