Google’s Jeff Dean — among the original architects of MapReduce, Bigtable, and Spanner — revealed some fascinating facts about Google’s internal environment at Cloudera HQ recently.
Earlier this week, we were pleased to welcome Google Senior Fellow Jeff Dean to Cloudera’s Palo Alto HQ to give an overview of some of his group’s current research. Jeff has a peerless pedigree in distributed computing circles, having been deeply involved in the design and implementation of Google’s original advertising serving system, MapReduce, Bigtable, Spanner, and a host of other projects.
Jeff’s presentation had two main parts:
- First, a discussion about Google’s efforts to bring classic fault tolerance principles to online services — to “create a predictably responsive whole out of less-predictable parts”, in his phrasing. This effort is born out of the fact that the growing size and complexity of Google’s infrastructure, and/or increases in usage, create an ever-higher risk of increased latency if left unchecked. Instead, Google strives for “tail-tolerant” systems that reduce that risk through the application of well known techniques.
- Second, an explanation of how Google is banking on the venerable neural network model to make “deep machine learning” a general abstraction across image, audio, and text processing. (In Dean’s view, the original neural networks of the late 1980s/early 1990s failed to meet expectations because of a lack of computational power as well as inadequate amounts of data for model-training purposes, neither of which are issues for Google today.) Dean predicts that neural nets will make a big comeback in the machine-learning area — and with Google’s track record of influence, that’s likely to be a self-fulfilling prophecy.
Jeff also spent some time addressing how MapReduce, Bigtable, and other familiar technologies are being used at Google today. For example, Jeff told us that more than 1 million MR jobs run at Google daily, although use of the native API was largely dropped in favor of FlumeJava (the inspiration for Apache Crunch) and other abstractions some time ago.
It’s hard to imagine a more prestigious computer scientist inside these walls. Thanks, Jeff!
Justin Kestelyn is Cloudera’s developer outreach director.