Understanding MapReduce via Boggle

Graph theory is a growing part of Big Data. Using graph theory, we can find relationships in networks. 

MapReduce is a great platform for traversing graphs. Therefore, one can leverage the power of an Apache Hadoop cluster to efficiently run an algorithm on the graph.

One such graph problem is playing Boggle*. Boggle is played by rolling a group of 16 dice. Each players’ job is find the most number of words spelled out by the dice. These dice are six-sided with a single letter that faces up:

 A Boggle roll creates a 4×4 matrix of nodes as the dice settle into their slots. Below is an example of the nodes in a Boggle roll.

Once we get into the dice relationships, things get interesting. Each die is related to the one above, below, to the left and right and the diagonals of each side. The diagram below shows the relationships for each die in the roll.

Let’s look at an example of traversing this graph. In this matrix, let’s say that the letter “U” is at position [0,0]. The letter “L” directly beneath it is at position [0,1]. Let’s try to spell out a word that starts with the letter “L” at [0,1].  Looking at the relationships, we have five possible letters to be the next letter in our word. Let’s go with the letter “A” at [1,0] as shown by the blue line. Our word so far is “LA”.

Continuing on from the letter “A” at [1,0], we only have four possible letters to be the next letter. We can’t use the “L” at [0,1] because the rules of Boggle forbid reusing the same letter or node. We choose the “T” at [1,1] again show by the blue line.  Our word so far is “LAT”. Let’s skip ahead by choosing the “T” at [1,3], the “E” at [0,3], and “R” at [1,4] which are all shown by the blue line. Our final word is “LATTER”.

Looking at this graph problem with human eyes makes the problem seem easy. Turning this in to an algorithm that finds for all of the words and scales, though, is a difficult problem. The way the graph’s relationships expand, this is a complex challenge with factorial growth. However, using Hadoop and MapReduce, we can create an efficient and scalable solution.

Coding

The full and working solution to this problem is on my GitHub account. The code includes an Eclipse project for trying out the code yourself.

Let’s take a look at the driver code for the graph traversal. Note that for brevity, all of these code snippets will have portions omitted. Please consult the code on GitHub for the unedited code.

  // Traverse the graph until it is completely traversed
do {
     Job job = new Job(configuration);

     FileInputFormat.setInputPaths(job, getPath(input, iteration));
     FileOutputFormat.setOutputPath(job, getPath(input, iteration + 1));

     // Create Map only job

     job.setMapperClass(BoggleMapper.class);

     job.setOutputKeyClass(Text.class);
     job.setOutputValueClass(RollGraphWritable.class);

     boolean success = job.waitForCompletion(true);

     iteration++;
} while (true);

 

This driver code is going to run a single node’s worth of relationships per iteration. In the example above, the “L” going to the “A” is one iteration and the “A” going to the “T” is another iteration. The iterations continue until the entire graph is traversed or all the possible words are found.

Notice the iterations are Map-only jobs; the extra overhead of the reducer isn’t necessary. The output from one iteration is fed into the next iteration to continue to traverse the graph.

We need to create a custom writable for the value to handle the nodes in our graph. This is the RollGraphWritable class.

ArrayList nodes = new ArrayList();
boolean isFinal;

 

RollGraphWritable keeps track of the nodes that make up the word ArrayList. It also keeps track of the RollGraphWritable‘s status to see if its child or adjacent nodes have been traversed. If the RollGraphWritable‘s children have been emitted, the node is marked as true.

The Node object just represents the place in the Boggle roll from which the individual letters are taken.

public short row;
public short column;

The Node object simply encodes the place in the matrix of the letter’s row and column position. This is just like the method I described previously when manually traversing the graph.

Next comes the mapper code.

if (!value.isFinal) {
       processNonFinalNode(context, key.toString(), value);
  } else {
       context.write(key, value);
  }

 

If a RollGraphWritable is final, it is simply emitted for the next iteration. This ensures that the RollGraphWritableis around for the final reducer.

If the RollGraphWritable is not final, we need to emit those nodes adjacent to the node.

  // Emit the characters around the last node in the Boggle Roll
Node node = rollGraph.nodes.get(rollGraph.nodes.size() - 1);

for (int row = node.row - 1; row < node.row + 2; row++) {
     if (row < 0 || row >= roll.rollSize) {
          // Check if row is outside the bounds and skip if so
          continue;
     }

     for (int col = node.column - 1; col < node.column + 2; col++) {
          if (col < 0 || col >= roll.rollSize) {
               // Check if column is outside the bounds and skip if so
               continue;
          }

          // Found viable row and column. See if node has already been traversed
          Node nextNode = new Node(row, col);

          if (!rollGraph.nodes.contains(nextNode)) {
               // Node not found, see if it passes the membership test
               String newWord = charsSoFar + roll.rollCharacters[row][col];

                     ArrayList nextNodeList = (ArrayList) rollGraph.nodes.clone();
                     nextNodeList.add(nextNode);

                     RollGraphWritable nextGraphWritable = new RollGraphWritable(nextNodeList, false);

                     context.write(new Text(newWord), nextGraphWritable);
          }
     }
}

 

This mapper code chunk handles iterating over the node’s relationships. There is a nested for loop. The first loop iterates over the node’s rows. The second loop iterates over the node’s columns.  When a viable (not outside the bounds of the matrix) is found, we check to see if the Node is already in the RollGraphWritable. This check is important because we can’t repeat the same letter twice as we’re traversing the graph. If that Node isn’t in the list, the next Node is added to the list and emitted.

The driver will continue to run this mapper code until the entire graph is traversed or all possible words are found. Once this is done, the driver will start a different MapReduce job.

Job job = new Job(configuration);

job.setNumReduceTasks(1);

job.setMapperClass(BoggleWordMapper.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(RollGraphWritable.class);

boolean success = job.waitForCompletion(true);

 

The second MapReduce job is much simpler and only runs once: This job will use the built-in identity reducer that will simply emit whatever the mapper emitted. This reduce will give us all words that appear in the Boggle roll in alphabetical order.

The BoggleWordMapper code is straightforward.

  String charsSoFar = key.toString();

   if (words.contains(charsSoFar)) {
       // Word appears, emit
       context.write(new Text(charsSoFar), value);
  }

 

The incoming key is the entire word and the value is RollGraphWritablethat contains the nodes that make up the word.

In the mapper’s setup method, a dictionary containing all the words in the English language are loaded into a HashMap for quick access. Every word that was found by traversing the graph is passed into the mapper. If the word appears in the HashMap, it will be emitted and appear in the final list of words in the Boggle roll.

In Part 2, we’ll explore optimizations to improve the performance of the job.

Jesse Anderson is a Curriculum Developer and Instructor for Cloudera University.

* Boggle is a trademark of Hasbro.

Filed under:

2 Responses

Leave a comment


5 × two =