Pushing the Limits of Distributed Processing
Lately we’ve been sharing stories about customers and how they’re using and benefiting from Hadoop. For example, last week we saw how Raytheon Researchers are using Hadoop to build a scalable, distributed triple store. This week’s war story comes from the inventor of MapReduce, Google, who is using MapReduce to reduce their map tile image files.
Apache Hadoop has been making waves of excitement in the industry for several years, pushing the limits of distributed processing. Founded on the premise that computation should move to data instead of the other way around, Hadoop has been deployed in hundreds of organizations, on clusters ranging from a few virtual nodes up to thousands.
We recently learned of a team at Google that has pushed Hadoop to the limits by creating a cluster whose size is on the order of 100,000 nodes, running on the recently released Nexus One mobile phone hardware, powered by Android. By pushing computation out to these devices, the Nexus One team was able to solve the difficult rendering and scaling problems in situ.
Since Google maps, street view, and directions were originally sized and scaled for display on the web, the team was faced with a difficult challenge when introducing mobile turn by turn directions. While the Google MapReduce cluster could easily be tasked with re-rendering for a smaller display, each image would have to be transmitted and tested on a device before it was served to account for proper scaling and pixel offsets. A direction that flowed off the screen would not only be an embarrassing quality issue, but raised safety concerns for drivers attempting to follow these directions. The driver would drive off the screen! A single iteration was certainly achievable but getting data back, adjusting the renderings and re-transmitting was obviously going to tax the network with unnecessary data movement.
Following the philosophy of pushing computation to data, the team realized that they could construct a massive distributed MapReduce cluster and perform the rendering on each device. When streaming down the images, each Nexus One would receive a high quality image that it then dynamically adjusted and measured, returning the final computations back to Google to produce the authoritative copy.
Since Google’s MapReduce implementation is highly customized for their environment, the team set out to look for a portable MapReduce platform to distribute to the Nexus One devices. Hadoop, which runs on a Java Virtual Machine, fit the bill.
Now Google has created not only the largest MapReduce cluster but also the smallest and most distributed one as well. And because of the efficient power consumption of the Nexus One and reduced network traffic. The environmental cost of this solution is 1/100th the equivalent of running it within their data center. Good thing Android supports background processes!
Overall the Nexus One team at Google has declared the solution a complete success and rumor has it is considering using the Nexus One platform as the basis of a new general purpose virtual data center by harnessing the aggregate compute power during rush hour on the Tokyo Metro lines.