Category Archives: General

Testing Apache Hadoop

Categories: General Hadoop

As a developer coming to Apache Hadoop it is important to understand how testing is organized in the project. For the most part it is simple — it’s really just a lot of JUnit tests — but there are some aspects that are not so well known.

Running Hadoop Unit Tests

Let’s have a look at some of the tests in Hadoop Core, and see how to run them. First check out the Hadoop Core source,

Read more

Securing an Apache Hadoop Cluster Through a Gateway

Categories: General Hadoop

(Added 6/4/2013) Please note the instructions below are deprecated. Please refer to the CDH4 Security Guide for up-to-date procedures.

A few weeks ago we ran an Apache Hadoop hackathon. ApacheCon participants were invited to use our 10-node Hadoop cluster to explore Hadoop and play with some datasets that we had loaded on in advance. One challenge we had to face was, how do we do this in a secure way?

Read more

Thrift, Scribe, Hive, and Cassandra: Open Source Data Management Software

Categories: General

Apache Hadoop exists within a rich ecosystem of tools for processing and analyzing large data sets. At Facebook, my previous employer, we contributed a few projects of note to this ecosystem, all under the Apache 2.0 license:

    • Thrift: A cross-language RPC framework that powers many of Facebook’s services, include search, ads, and chat. Among other things, Thrift defines a compact binary serialization format that is often used to persist data structures for later analysis.

    Read more

    Welcome to Cloudera’s Hadoop blog!

    Categories: General

    We’ve created this blog as a place to post tips, tricks and insights on using Hadoop and related projects for the next generation of data storage and analysis. Of course, we’re also active on the Hadoop mailing lists and other public forums, but we wanted a place where we could capture some of the lessons we learn as we work with the community and our customers.

    Except for this inaugural post,

    Read more