Monday, June 18th, 2012
Optimizing MapReduce job performance is often seen as something of a black art. In order to maximize performance, developers need to understand the inner workings of the MapReduce execution framework and how they are affected by various configuration parameters and MR design patterns. The talk will illustrate the underlying mechanics of job and task execution, including the map side sort/spill, the shuffle, and the reduce side merge, and then explain how different job configuration parameters and job design strategies affect the performance of these operations. Though the talk will cover internals, it will also provide practical tips, guidelines, and rules of thumb for better job performance. The talk is primarily targeted towards developers directly using the MapReduce API, though will also include some tips for users of higher level frameworks.