Author Archives: Jeff Hammerbacher

Introducing Cloudera Desktop

Categories: General

Today at Hadoop World NYC, we’re announcing the availability of Cloudera Desktop, a unified and extensible graphical user interface for Hadoop. The product is free to download and can be used with either internal clusters or clusters running on public clouds.

At Cloudera, we’re focused on making Hadoop easy to install, configure, manage, and use for all organizations. While there exist many utilities for developers who work with Hadoop,

Read more

Sending Files to Remote Task Nodes with Hadoop MapReduce

Categories: Hadoop MapReduce

It is common for a MapReduce program to require one or more files to be read by each map or reduce task before execution. For example, you may have a lookup table that needs to be parsed before processing a set of records. To address this scenario, Hadoop’s MapReduce implementation includes a distributed file cache that will manage copying your file(s) out to the task execution nodes.

The DistributedCache was introduced in Hadoop 0.7.0;

Read more

Thrift, Scribe, Hive, and Cassandra: Open Source Data Management Software

Categories: General

Apache Hadoop exists within a rich ecosystem of tools for processing and analyzing large data sets. At Facebook, my previous employer, we contributed a few projects of note to this ecosystem, all under the Apache 2.0 license:

    • Thrift: A cross-language RPC framework that powers many of Facebook’s services, include search, ads, and chat. Among other things, Thrift defines a compact binary serialization format that is often used to persist data structures for later analysis.

    Read more