At the beginning of September, we announced the first release of CDH2, our current testing repository. Packages in our testing repository are recommended for people who want more features and are willing to upgrade as bugs are worked out. Our testing packages pass unit and functional tests but will not have the same soak time as our stable packages. A testing release represents a work in progress that will eventually be promoted to stable.
Disclaimer: Cloudera no longer approves of the recommendations in this post. Please see this documentation for configuration recommendations.
One of the things we get a lot of questions about is how to make Hadoop highly available. There is still a lot of work to be done on this front, but we wanted to take a moment and share the best practices from one of our customers. Check out what Paul George has to say about how they keep thier NameNode up at ContextWeb.
Last Wednesday, we hosted a Hadoop meetup, and I gave a short talk about the new project split. How does the split change the project’s organization, and what does it mean for end users?
The mailing lists and the source code repositories have been rearranged. For those doing development against Hadoop’s “trunk” branch, compiling Hadoop and using the various components in concert has become more complicated.
My presentation slides cover which mailing lists to subscribe to,
There is some confusion about the state of the file append operation in HDFS. It was in, now it’s out. Why was it removed, and when will it be reinstated? This post looks at some of the history behind HDFS capability for supporting file appends.
Early versions of HDFS had no support for an append operation. Once a file was closed, it was immutable and could only be changed by writing a new copy with a different filename.
Administrators of HDFS clusters understand that the HDFS metadata is some of the most precious bits they have. While you might have hundreds of terabytes of information stored in HDFS, the NameNode’s metadata is the key that allows this information, spread across several million “blocks” to be reassembled into coherent, ordered files.
The techniques to preserve HDFS NameNode metadata are well established. You should store several copies across many separate local hard drives,