HDFS Reliability

We’ve been talking to enterprise users of Hadoop about existing and new projects, and lots of them are asking questions about reliability and data integrity.  So we wrote up a short paper entitled HDFS Reliability to summarize the state of the art and provide advice.  We’d like to get your feedback, too, so please leave a comment.

Filed under:

4 Responses
  • gabriel / January 14, 2009 / 9:28 PM

    Thanks for the paper — don’t know if I’m missing the obvious, is there a link to raw reliability numbers?

  • hammer / January 14, 2009 / 11:59 PM

    Good commentary from Steve Loughran: http://1060.org/blogxter/entry?publicid=56985B46DE7063B5124E09772CE40CEA

  • yikai / January 17, 2009 / 8:50 AM

    Great article,
    Two suggestions on “Protect the name node”,
    1.Running 64 bits JDK on name node (with more memory) even data/computing node run 32 bits. (Yahoo)
    2. Apply dual power on name node box.

  • Redwood Job Scheduling / January 30, 2009 / 9:05 PM

    I’m more interested in the system’s reliability in hardware failures. Thanks for the information.

Leave a comment

9 × = twenty seven