Category Archives: Hadoop

Hive and JobTracker Needed Logos…

Categories: Hadoop Hive

In the process of working on a few things here I wanted to add some links to launch Apache Hive and the Hadoop Jobtracker. At first I considered just adding the links but I found myself wanting a button of some sort; an icon for them. I didn’t want to just use the (awesomely cute) Apache Hadoop logo elephant because these things are related to and part of Hadoop, but they aren’t Hadoop itself…

Read more

Cloudera’s Distribution for Apache Hadoop: Making Hadoop Easier for a Sysadmin

Categories: Hadoop

A few weeks ago we announced Cloudera’s Distribution for Apache Hadoop, and I want to spend some time showing how our distribution makes a sysadmin’s job a little easier.

Perhaps the most useful features in our distribution, at least for sysadmins, are RPM packages and init scripts.  RPMs are the standard way of installing software on a Red Hat Linux distribution (RHEL, Fedora Core, CentOS).  They give sysadmins a one-command install,

Read more

Upcoming Functionality in “Fair Scheduler 2.0”

Categories: General Hadoop MapReduce

(guest blog post by Matei Zaharia)

As Hadoop clusters grow in size and data volume, it becomes more and more useful to share them between multiple users and to isolate these users. If User 1 is running a ten-hour machine learning job for example, this should not impair a User 2 from running a 2-minute Hive query. In November, I blogged about how Hadoop 0.19 supports pluggable job schedulers,

Read more

Configuration Parameters: What can you just ignore?

Categories: General Hadoop HDFS MapReduce

Configuring a Hadoop cluster is something akin to voodoo. There are a large number of variables in hadoop-default.xml that you can override in hadoop-site.xml. Some specify file paths on your system, but others adjust levers and knobs deep inside Hadoop’s guts. Unfortuately, there’s little or no documentation on how to set them well. Is there a single optimal configuration? Are there some settings that can just be “set to 11?”

Nigel's guitar goes to 11, but your cluster might not. At Cloudera,

Read more

Announcing Cloudera’s Distribution for Apache Hadoop

Categories: Community General Hadoop

One of the repeating themes we have heard while working with our customers and the community is that Apache Hadoop configuration and deployment is a pain. Often times, Hadoop is the first truly distributed system that administrators encounter, and the problem is made worse by the lack of standardized packages and deployment tools. And some releases are buggy. And upgrades are hard. And the list goes on.

In order for Hadoop to truly disrupt the enterprise,

Read more