Tag Archives: Testing

Debian packages for Apache Hadoop

Categories: Community Hadoop

When we announced Cloudera’s Distribution for Apache Hadoop last month, we asked the community to give us feedback on what features they liked best and what new development was most important to them. Almost immediately, Debian and Ubuntu packages for Hadoop emerged as the most popular request. A lot of customers prefer Debian derivatives over Red Hat, and installing RPMs on top of Debian, while possible with tools like alien,

Read More

Upcoming Functionality in “Fair Scheduler 2.0”

Categories: General Hadoop MapReduce

(guest blog post by Matei Zaharia)

As Hadoop clusters grow in size and data volume, it becomes more and more useful to share them between multiple users and to isolate these users. If User 1 is running a ten-hour machine learning job for example, this should not impair a User 2 from running a 2-minute Hive query. In November, I blogged about how Hadoop 0.19 supports pluggable job schedulers,

Read More

Testing Apache Hadoop

Categories: General Hadoop

As a developer coming to Apache Hadoop it is important to understand how testing is organized in the project. For the most part it is simple — it’s really just a lot of JUnit tests — but there are some aspects that are not so well known.

Running Hadoop Unit Tests

Let’s have a look at some of the tests in Hadoop Core, and see how to run them. First check out the Hadoop Core source,

Read More