Category Archives: HDFS

The Project Split

Categories: Community General Hadoop HDFS MapReduce

Last Wednesday, we hosted a Hadoop meetup, and I gave a short talk about the new project split. How does the split change the project’s organization, and what does it mean for end users?

The mailing lists and the source code repositories have been rearranged. For those doing development against Hadoop’s “trunk” branch, compiling Hadoop and using the various components in concert has become more complicated.

My presentation slides cover which mailing lists to subscribe to,

Read more

File Appends in HDFS

Categories: General Hadoop HDFS

There is some confusion about the state of the file append operation in HDFS. It was in, now it’s out. Why was it removed, and when will it be reinstated? This post looks at some of the history behind HDFS capability for supporting file appends.

Background

Early versions of HDFS had no support for an append operation. Once a file was closed, it was immutable and could only be changed by writing a new copy with a different filename.

Read more

Protecting per-DataNode Metadata

Categories: Hadoop HDFS

Administrators of HDFS clusters understand that the HDFS metadata is some of the most precious bits they have. While you might have hundreds of terabytes of information stored in HDFS, the NameNode’s metadata is the key that allows this information, spread across several million “blocks” to be reassembled into coherent, ordered files.

The techniques to preserve HDFS NameNode metadata are well established. You should store several copies across many separate local hard drives,

Read more

High Energy Hadoop

Categories: General Guest Hadoop HDFS

We asked Brian Bockelman, a Post Doc Research Associate in the Computer Science & Engineering Department at the University of Nebraska–Lincoln, to tell us how Hadoop is being used to process the results from High-Energy Physics experiments.  His response gives insights into the kind and volume of data that High-Energy Physics experiments generate and how Hadoop is being used at the University of Nebraska. -Matt

In the least technical language,

Read more

Configuring Eclipse for Apache Hadoop Development (a screencast)

Categories: Data Ingestion General HDFS Training

Update (added 5/15/2013): The information below is dated; see this post for current instructions about configuring Eclipse for Hadoop contributions.

One of the perks of using Java is the availability of functional, cross-platform IDEs.  I use vim for my daily editing needs, but when it comes to navigating, debugging, and coding large Java projects, I fire up Eclipse.

Typically, when you’re developing Map-Reduce applications,

Read more