Tag Archives: HDFS

State of the Elephant 2008

Categories: Community General Hadoop

It’s a new year, the time when we take a moment to look back at the previous one, and forward to what might be coming next. In the world of Hadoop a lot happened in 2008.

Organization

At the beginning of the year, Hadoop was a sub-project of Lucene. In January, Hadoop became a Top Level Project at Apache, in recognition of its success and diversity of community. This allowed sub-projects to be added,

Read more

Testing Apache Hadoop

Categories: General Hadoop

As a developer coming to Apache Hadoop it is important to understand how testing is organized in the project. For the most part it is simple — it’s really just a lot of JUnit tests — but there are some aspects that are not so well known.

Running Hadoop Unit Tests

Let’s have a look at some of the tests in Hadoop Core, and see how to run them. First check out the Hadoop Core source,

Read more

Securing an Apache Hadoop Cluster Through a Gateway

Categories: General Hadoop

(Added 6/4/2013) Please note the instructions below are deprecated. Please refer to the CDH4 Security Guide for up-to-date procedures.

A few weeks ago we ran an Apache Hadoop hackathon. ApacheCon participants were invited to use our 10-node Hadoop cluster to explore Hadoop and play with some datasets that we had loaded on in advance. One challenge we had to face was, how do we do this in a secure way?

Read more

Introducing Hadoop Development Status

Categories: Community

We’re happy to announce a new tool we have been developing here at Cloudera: Hadoop Development Status. Hadoop Development Status aims to help the Hadoop community understand its direction, health, and participants. The project currently monitors the most active contributors according to mailing list traffic, the most watched JIRA tickets, and aggregate traffic volumes on the Hadoop mailing lists.

The graph of messages per month on the Hadoop Core lists shows a sustained growth in traffic.

Read more