Building and Deploying MR2

A number of architectural changes have been added to Hadoop MapReduce. The new MapReduce system is called MR2 (AKA MR.next). The first release version to include these changes will be Apache Hadoop 0.23.

A key change in the new architecture is the disappearance of the centralized JobTracker service. Previously, the JobTracker was responsible for provisioning the resources across the whole cluster, in addition to managing the life cycle of all submitted MapReduce applications; this typically included starting, monitoring and retrying the applications individual tasks. Throughout the years and from a practical perspective, the Hadoop community has acknowledged the problems that inherently exist in this functionally aggregated design (See MAPREDUCE-279).

In MR2, the JobTracker aggregated functionality is separated across two new components:

  1. Central Resource Manager (RM): Management of resources in the cluster.
  2. Application Master (AM): Management of the life cycle of an application and its tasks. Think of the AM as a per-application JobTracker.

The new design enables scaling Hadoop to run on much larger clusters, in addition to the ability to run non-mapreduce applications on the same Hadoop cluster. For more architecture details, the interested reader may refer to the design document at: https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf.

The objective of this blog is to outline the steps for building, configuring, deploying and running a single-node NextGen MR cluster.

In the following steps, I’ve chosen to use ~/mr2 as my working directory. Inside this directory, we’ll create a source directory for the code we’ll soon checkout, and a deploy directory for our deployment.

Make sure protbuf is in your library path or:

We’ll now checkout the source code from the apache git repository.

Create the deployment tar files.

Copy the created Hadoop tar file to our deploy directory and Untar it.

Export some needed environment variables. See the following listing:

We’ll start our configuration; the configuration directory will contain the following files: core-site.xml, hdfs-site.xml, mapred-site.xml, slaves,  yarn-env.sh and yarn-site.xml.

Make sure the contents of the xml files in the conf directory are as follows:

1
2
3
4
5
6
7
8
9
10
11
12
<?xml version=”1.0″?>
<configuration>
<!– Site specific YARN configuration properties –>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

1
2
3
4
5
6
7
8
<?xml version=”1.0″?>
<?xml-stylesheet href=”configuration.xsl”?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

1
2
3
4
5
6
7
8
<?xml version=”1.0″?>
<?xml-stylesheet href=”configuration.xsl”?>
<configuration>
<property>
<name> mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

1
2
3
4
5
6
7
8
9
10
11
12
<?xml version=”1.0″?>
<?xml-stylesheet href=”configuration.xsl”?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>

Now we can format our HDFS namenode as usual:

And start the HDFS services:

And the new MR2 services

Make sure all needed services are running:

You can see the resource manager web console (shown below) using this address: http://localhost:8088

Our usual namenode web console should be also up:

We can now start running some example jobs, here is how we submit our first job; the randomwriter from our examples jar:

If everything goes fine, you’ll see the job console output and finally:

We’ll now run the conventional wordcount example job, to see a full map & reduce job (as opposed to the randomwriter map-only job):

The job counters output after successful run:

MR2 comes with new and updated web consoles that can be used to conveniently explore and monitor the system services. The Applications tab shows the submitted applications, the application’s id, user name, queue name, status, progress and other useful information. For example, here is the resource manager Applications snapshot when randomwriter and wordcount are running.

The nodes tab shows the nodes of the cluster, their addresses, in addition to health and container information. For example, here is the nodes view for our single node cluster.

The scheduler view shows useful scheduling information. In our example, we used the default FifoScheduler. And, as seen in the following snapshot, the view shows information like the queue minimum and maximum capacities, number of nodes, total and available capacities and other useful information. This snapshot was captured after running the randomwriter application but before submitting the wordcount application.

Filed under:

10 Responses
  • Phaneesh / November 16, 2011 / 9:09 AM

    Hi,

    Thanks a ton for the guide!. Was waiting to try out MR2. I am trying to deploy it on Ubuntu 11.10 64 bit, Oracle Java 7 64 bit. I was able to build without any issues. However when I try to format the name node I am getting the following exception:

    11/11/16 21:30:22 ERROR namenode.NameNode: Exception in namenode join
    java.lang.IllegalArgumentException: URI has an authority component
    at java.io.File.(File.java:397)
    at org.apache.hadoop.hdfs.server.namenode.NNStorage.getStorageDirectory(NNStorage.java:311)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.(FSEditLog.java:170)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.(FSImage.java:125)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:604)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:732)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:799)

    Should I use Oracle/Sun Java 6 instead of 7?

    Thanks
    NP

  • Ahmed Radwan / November 21, 2011 / 5:41 PM

    Thanks PHANEESH!

    This doesn’t seem to be an mr2 issue. I think it a problem in setting up HDFS, which you’ll also face if trying to install older MR versions on this same system.

    Have you explicitly added any other configuration properties other than the ones in the xml files above? For example: dfs.namenode.edit.dir property, if not set, then the default will be to use hadoop.tmp.dir, can you check your /tmp/hadoop-${user.name}? Are you accessing this location through a network resource referenced through a UNC path for example?

  • Marcos Ortiz / November 28, 2011 / 4:21 PM

    Wow, good tutorial, I will try to reproduce this on my Fedora 15. Regards and best wishes.

  • Patrick Wendell / December 11, 2011 / 11:47 PM

    The examples jar is now built in a slightly different location than is listed here. It’s particularly confusing because the old location still contains a jar with the correct name but the jar is empty. The new location is here:

    hadoop-mapreduce-project/hadoop-mapreduce-examples/target/

    Discussed here:
    https://issues.apache.org/jira/browse/MAPREDUCE-3492

  • MRK / December 19, 2011 / 2:32 AM

    HI,

    Thanks for the inputs. When i install apache hadoop-0.23.0 build, i am able to start namenode and datanode.
    but namenode is in safemode always. the below message is shown in name node logs.

    WARN org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space available on volume ‘nfs” is 0, which is below the configured reserved amount 104857600
    2011-12-19 08:28:36,982 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on available disk space. Already in safe mode.
    2011-12-19 08:28:36,988 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode is ON. Resources are low on NN. Safe mode must be turned off manually.

    I am seeing there is any enough space on the specified volume. Please let me know whats wrong here.

  • srikanth / January 08, 2012 / 1:04 AM

    Nice tutorial and guided us in right direction. But i got a small problem in creating examples jar in hadoop-examples using ant. i feel all the jar files are not loaded correctly and cannot able to run an example…did any one face the same problem? please let me know….

  • Varun Thacker / February 13, 2012 / 4:03 AM

    Solved my problem.

    I was using Maven 2.2.1 . After reading the build notes and installing Maven 3.0+ it started the build process.

  • Pavan / July 09, 2012 / 5:36 PM

    Great tutorial and I could start almost all the daemons except the Nodemanager.
    I get this following error, any idea about this ? Thanks

    Unrecognized option : -jvm
    Could not create the Java virtual machine

Leave a comment


9 + two =