How-to: Use the New HDFS Intra-DataNode Disk Balancer in Apache Hadoop

Categories: CDH Hadoop HDFS

HDFS now includes (shipping in CDH 5.8.2 and later) a comprehensive storage capacity-management approach for moving data across nodes.

In HDFS, the DataNode spreads the data blocks into local filesystem directories, which can be specified using in hdfs-site.xml. In a typical installation, each directory, called a volume in HDFS terminology, is on a different device (for example, on separate HDD and SSD).

When writing new blocks to HDFS, DataNode uses a volume-choosing policy to choose the disk for the block. Two such policy types are currently supported: round-robin or available space (HDFS-1804).

Briefly, as illustrated in Figure 1, the round-robin policy distributes the new blocks evenly across the available disks, while the available-space policy preferentially writes data to the disk that has the most free space (by percentage).


By default, the DataNode uses the round-robin-based policy to write new blocks. However, in a long-running cluster, it is still possible for the DataNode to have created significantly imbalanced volumes due to events like massive file deletion in HDFS or the addition of new DataNode disks via the disk hot-swap feature. Even if you use the available-space-based volume-choosing policy instead, volume imbalance can still lead to less efficient disk I/O: For example, every new write will go to the newly-added empty disk while the other disks are idle during the period, creating a bottleneck on the new disk.

Recently, the Apache Hadoop community developed server offline scripts (as discussed inHDFS-1312, the dev@ mailing list, and GitHub) to alleviate the data imbalance issue. However, due to being outside the HDFS codebase, these scripts require that the DataNode be offline before moving data between disks. As a result, HDFS-1312 also introduces an online disk balancer that is designed to re-balance the volumes on a running DataNode based on various metrics. Similar to the HDFS Balancer, the HDFS disk balancer runs as a thread in the DataNode to move the block files across volumes with the same storage types.

In the remainder of this post, you’ll learn why and how to use this new feature.

Disk Balancer 101

Let’s explore through this useful feature step-by-step using an example. First, confirm that the dfs.disk.balancer.enabled configuration is set to true on all DataNodes. From CDH 5.8.2 onward, a user can specify this configuration via the HDFS safety valve snippet in Cloudera Manager:


In this example, we will add a new disk to a pre-loaded HDFS DataNode (/mnt/disk1), and mount the new disk to /mnt/disk2. In CDH, each HDFS data directory is on a separate disk, so you can use df to show disk usage:

Obviously, it’s time to make the disks balanced again!

A typical disk-balancer task involves three steps (implemented via the hdfs diskbalancer command): plan, execute, and query. In the first step, the HDFS client reads necessary information from the NameNode regarding the specified DataNode to generate an execution plan:

As you can see from the output, the HDFS disk balancer uses a planner to calculate the steps for the data movement plan on the specified DataNode, by using the disk-usage information that DataNode reports to the NameNode. Each step specifies the source and the target volumes to move data, as well as the amount of data expected to move.

At the time of this writing, the only planner supported in HDFS is GreedyPlanner, which constantly moves data from the most-used device to the least-used device until all data is evenly distributed across all devices. Users can also specify the threshold of space utilization in the plan command; therefore, the planner considers the disks balanced if the difference in space utilization is under the threshold. (The other notable option is to throttle the diskbalancer task I/O by specifying --bandwidth during the planning process, so that the disk balancer I/O will not influence foreground work.)

The disk-balancer execution plan is generated as a JSON file stored in HDFS. By default, the plan files are saved under the /system/diskbalancer directory:

To execute the plan on DataNode, run:

This command submits the JSON plan file to the DataNode, which executes it in a background BlockMover thread.

To check the status of the diskbalancer task on the DataNode, use the query command:

The output (PLAN_DONE) indicates that the disk-balancing task is complete. To verify the effectiveness of the disk balancer, use df -h again to see the data distribution across two local disks:

The output confirms that the disk balancer successfully reduced the difference in disk-space usage across volumes to under 10%. Mission accomplished!

To read more details about the HDFS disk balancer, read the Cloudera docs and the upstream docs.


With the long-awaited intra-DataNode disk balancer feature introduced in HDFS-1312, the version of HDFS shipping in CDH 5.8.2 and later provides a comprehensive storage capacity-management solution that can move data across nodes (Balancer), storage types (Mover), and disks within a single DataNode (Disk Balancer).


HDFS-1312 was collaboratively developed by Anu Engineer, Xiaobin Zhou, and Arpit Agarwal from Hortonworks, and Lei (Eddy) Xu and Manoj Govindasamy from Cloudera.

Lei Xu is a Software Engineer at Cloudera, and a committer to Hadoop and member of the Hadoop PMC.


10 responses on “How-to: Use the New HDFS Intra-DataNode Disk Balancer in Apache Hadoop

  1. Manishn

    How do you get the name of the plan
    hdfs diskbalancer -query lei-dn-3:20001
    (e.g. i am assuming lei-dn-3 is the third datanode in the cluster.)
    How do you derive the 20001?

    1. Manoj Govindassamy

      You could run diskbalancer report command to know top data nodes that can benefit from disk balancing operation and their protocol port details.
      # hdfs diskbalancer -report -top 5

  2. Hadoop Training In Hyderabad

    The article you have shared is really amazing. All the various functionalities and methodologies of Hadoop makes the day to day life easier.

  3. Majid Alfifi

    For status querying to work, I had to drop the port and just issue the command:
    hdfs diskbalancer -query lei-dn-3
    Love the addition of this little tool.

    1. Majid Alfifi

      Ignore this comment.
      20001 is not port but a plan id or something that get generated automatically by the planner.

  4. Alex

    I have a problem. I have 4 disk per node, but only the first one get filled by HDFS. The disks are all correctly recognized and I see the correct total space in hdfs.
    [DISK: volume-null] – 0.91 used: 1747461025392/1924460904448, 0.09 free: 176999879056/1924460904448, isFailed: False, isReadOnly: False, isSkip: False, isTransient: False.
    [SSD: volume-null] – 0.00 used: 32768/1958136823808, 1.00 free: 1958136791040/1958136823808, isFailed: False, isReadOnly: False, isSkip: False, isTransient: False.
    [SSD: volume-null] – 0.00 used: 32768/1958136823808, 1.00 free: 1958136791040/1958136823808, isFailed: False, isReadOnly: False, isSkip: False, isTransient: False.
    [SSD: volume-null] – 0.00 used: 32768/1958136823808, 1.00 free: 1958136791040/1958136823808, isFailed: False, isReadOnly: False, isSkip: False, isTransient: False.

    I tried to plan a rebalance but I get
    17/04/26 01:30:04 INFO planner.GreedyPlanner: Starting plan for Node : xxx:50020
    17/04/26 01:30:04 INFO planner.GreedyPlanner: Compute Plan for Node : xxx:50020 took 4 ms
    17/04/26 01:30:05 INFO command.Command: No plan generated. DiskBalancing not needed for node: bigdata4 threshold used: 10.0

    Any idea?

  5. sandhosh

    superb… is much interesting which engaged me more.Spend a worthful time.keep updating more.