Hadoop Default Ports Quick Reference

Editor’s note (Oct. 3, 2013): The information below is now deprecated. We recommend that you consult this documentation for ports info instead.

Is it 50030 or 50300 for that JobTracker UI? I can never remember!

Hadoop’s daemons expose a handful of ports over TCP. Some of these ports are used by Hadoop’s daemons to communicate amongst themselves (to schedule jobs, replicate blocks, etc.). Others ports are listening directly to users, either via an interposed Java client, which communicates via internal protocols, or via plain old HTTP.

This post summarizes the ports that Hadoop uses; it’s intended to be a quick reference guide both for users, who struggle with remembering the correct port number, and systems administrators, who need to configure firewalls accordingly.

Web UIs for the Common User

The default Hadoop ports are as follows:

  Daemon Default Port Configuration Parameter
HDFS Namenode 50070 dfs.http.address
Datanodes 50075 dfs.datanode.http.address
Secondarynamenode 50090 dfs.secondary.http.address
Backup/Checkpoint node? 50105 dfs.backup.http.address
MR Jobracker 50030 mapred.job.tracker.http.address
Tasktrackers 50060 mapred.task.tracker.http.address
? Replaces secondarynamenode in 0.21.

Hadoop daemons expose some information over HTTP. All Hadoop daemons expose the following:

/logs
Exposes, for downloading, log files in the Java system property hadoop.log.dir.
/logLevel
Allows you to dial up or down log4j logging levels. This is similar to hadoop daemonlog on the command line.
/stacks
Stack traces for all threads. Useful for debugging.
/metrics
Metrics for the server. Use /metrics?format=json to retrieve the data in a structured form. Available in 0.21.

Individual daemons expose extra daemon-specific endpoints as well. Note that these are not necessarily part of Hadoop’s public API, so they tend to change over time.

The Namenode exposes:

/
Shows information about the namenode as well as the HDFS. There’s a link from here to browse the filesystem, as well.
/dfsnodelist.jsp?whatNodes=(DEAD|LIVE)
Shows lists of nodes that are disconnected from (DEAD) or connected to (LIVE) the namenode.
/fsck
Runs the “fsck” command. Not recommended on a busy cluster.
/listPaths
Returns an XML-formatted directory listing. This is useful if you wish (for example) to poll HDFS to see if a file exists. The URL can include a path (e.g., /listPaths/user/philip) and can take optional GET arguments: /listPaths?recursive=yes will return all files on the file system; /listPaths/user/philip?filter=s.* will return all files in the home directory that start with s; and /listPaths/user/philip?exclude=.txt will return all files except text files in the home directory. Beware that filter and exclude operate on the directory listed in the URL, and they ignore the recursive flag.
/data and /fileChecksum
These forward your HTTP request to an appropriate datanode, which in turn returns the data or the checksum.

Datanodes expose the following:

/browseBlock.jsp, /browseDirectory.jsp, tail.jsp, /streamFile, /getFileChecksum
These are the endpoints that the namenode redirects to when you are browsing filesystem content. You probably wouldn’t use these directly, but this is what’s going on underneath.
/blockScannerReport
Every datanode verifies its blocks at configurable intervals. This endpoint provides a listing of that check.

The secondarynamenode exposes a simple status page with information including which namenode it’s talking to, when the last checkpoint was, how big it was, and which directories it’s using.

The jobtracker‘s UI is commonly used to look at running jobs, and, especially, to find the causes of failed jobs. The UI is best browsed starting at /jobtracker.jsp. There are over a dozen related pages providing details on tasks, history, scheduling queues, jobs, etc.

Tasktrackers have a simple page (/tasktracker.jsp), which shows running tasks. They also expose /taskLog?taskid= to query logs for a specific task. They use /mapOutput to serve the output of map tasks to reducers, but this is an internal API.

Under the Covers for the Developer and the System Administrator

Internally, Hadoop mostly uses Hadoop IPC to communicate amongst servers. (Part of the goal of the Apache Avro project is to replace Hadoop IPC with something that is easier to evolve and more language-agnostic; HADOOP-6170 is the relevant ticket.) Hadoop also uses HTTP (for the secondarynamenode communicating with the namenode and for the tasktrackers serving map outputs to the reducers) and a raw network socket protocol (for datanodes copying around data).

The following table presents the ports and protocols (including the relevant Java class) that Hadoop uses. This table does not include the HTTP ports mentioned above.

Daemon Default Port Configuration Parameter Protocol Used for
Namenode 8020 fs.default.name? IPC: ClientProtocol Filesystem metadata operations.
Datanode 50010 dfs.datanode.address Custom Hadoop Xceiver: DataNode and DFSClient DFS data transfer
Datanode 50020 dfs.datanode.ipc.address IPC: InterDatanodeProtocol, ClientDatanodeProtocol
ClientProtocol
Block metadata operations and recovery
Backupnode 50100 dfs.backup.address Same as namenode HDFS Metadata Operations
Jobtracker Ill-defined.? mapred.job.tracker IPC: JobSubmissionProtocol, InterTrackerProtocol Job submission, task tracker heartbeats.
Tasktracker 127.0.0.1:0¤ mapred.task.tracker.report.address IPC: TaskUmbilicalProtocol Communicating with child jobs
? This is the port part of hdfs://host:8020/.
? Default is not well-defined. Common values are 8021, 9001, or 8012. See MAPREDUCE-566.
¤ Binds to an unused local port.

That’s quite a few ports! I hope this quick overview has been helpful.

Filed under:

12 Responses
  • Son Nguyen / November 15, 2009 / 9:41 AM

    Very useful for us to determine the right ports to open in firewall

  • Sudhir V / May 26, 2010 / 11:01 PM

    The “secondarynamenode exposes a simple status page with information including which namenode it’s talking to, when the last checkpoint was, how big it was, and which directories it’s using”

    This is only available from version 0.21 onwards. Check https://issues.apache.org/jira/browse/HADOOP-3741 for more details

  • Dave / June 17, 2010 / 9:47 PM

    Very handy, thanks!

  • Stephan / September 02, 2010 / 2:32 AM

    Thank, super usefull, but we found that our HDFS Datanodes also seem to listen on another random port.
    So far in our setting we found:
    dfs.datanode.http.address – 50075
    dfs.datanode.address – 50010
    dfs.datanode.ipc.address – 50020

    + another random port, and i dont think it’s
    dfs.secondary.http.address
    because i cant connect to it with a browser

    we are using hadoop 20.2… does anyone have an idea?

  • haden / March 25, 2011 / 3:05 PM

    Great reference thanks.. finally was able to sole the 8020 port error I was having.

  • hadoop-user59 / May 31, 2012 / 5:14 PM

    Why does the script /usr/sbin/hadoop-validate-setup.sh use port 9000? I installed the single-node hadoop, started everything using the script /usr/sbin/hadoop-setup-single-node.sh but then the validate script attempts connection to 9000 and not 8020. This then gives the following error:

    su -c ‘/usr/libexec/../bin/hadoop –config /usr/libexec/../etc/hadoop jar /usr/libexec/../share/hadoop/hadoop-examples-1.0.3.jar teragen 10000 validate_deploy_1338509411/tera_gen_data’
    12/06/01 00:10:13 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s).
    12/06/01 00:10:14 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
    12/06/01 00:10:15 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).

Leave a comment


− four = 3