29 responses on “Cloudera’s Support Team Shares Some Basic Hardware Recommendations

  1. anon

    Yeah, if your controller is good quality RAID 0 should make a really big difference, but I’ve seen references elsewhere to not using it in Hadoop. I’m wondering the same thing.

  2. anon

    Alex L. –

    Did you test with an enterprise-class RAID card for the RAID 0 vs JBOD test? Often low-end commodity servers will have very poor RAID-cards unless specifically chosen, and sometimes even use software RAID, which can be very tricky and not very fast at all.

    If you did use an enterprise-level RAID card, then perhaps there is something protocol wise that RAID introduces that aggravates HDFS – and HDFS is able to fundamentally access JBOD at a lower level?

  3. Shai

    Hi Alex,

    Thanks for making it so simple…
    When we look at an installation consists of 100 data nodes – couldn’t it be more efficient, in terms of space, power consumption and number of data nodes needed to provide the performance, to use diskless servers (could be 1U or blade) and connect them to a good midrange central storage? This way the storage resources can be shared across all nodes, less disks used and no need to maintain 3 copies of each block (RAID protection within the storage)?

  4. thattommyhall

    This is great advice, much more up to date than the Machine Sizing page on the Hadoop wiki.

    @SHAI, hadoop loves cheap raw disk as it is optimised for linear reads and writes. SAN would be suboptimal here

  5. Jake Solis


    Have you tried configuring data nodes/ task trackers using a single 8-core, 12-core or 16-core processor? The single socket motherboard servers draw less power than a dual socket system. This solution would offer more cores and draw less power than a typical 2 x quad core node.

  6. Russ Jensen

    Hey all-

    I’m a network engineer looking at building the underlying support infrastructure for customers that my firm will be deploying hadoop for. I wanted to point out that although this blog mentions 1GB and 10GB connections, that not all 1GB and 10GB connections are the same.

    You need to look at what the oversubscription ratios are on the ports, what the actual switching times are, what the ASICS archtiecture is (blocking vs. non-blocking and to what degree) etc…

    My point is simply that you can have the best-designed (from a server perspective) cluster that money can buy, *but*, if you’re trying to use Netgear or Dell switching, you’re not going to be too happy with the results. :-)

    Network design and proper equipment spec’ing is just as important as the hadoop design and hardware.

    1. Jon Zuanich

      There have been improvements in scalability between different versions, and the rule of thumb is just to set the right order of magntitude. Actual memory requirements may vary (eg due to different lengths of file names, etc).