Author Archives: Philip Zeyliger

How Does Cloudera Manager Work?

Categories: CDH Cloudera Manager Hadoop Ops and DevOps

At Cloudera, we believe that Cloudera Manager is the best way to install, configure, manage, and monitor your Apache Hadoop stack. Of course, most users prefer not to take our word for it — they want to know how Cloudera Manager works under the covers, first. 

In this post, I’ll explain some of its inner workings. 

The Vocabulary of Cloudera Manager

The image below illustrates the basic nouns and relationships of Cloudera Manager:

A “deployment”

Read more

How Raytheon BBN Technologies Researchers are Using Hadoop to Build a Scalable, Distributed Triple Store

Categories: Guest

This post was contributed by Kurt Rohloff, a researcher in the Information and Knowledge Technologies group of Raytheon BBN Technologies, a wholly owned subsidiary of Raytheon Company.

Using Hadoop to Build a Scalable, Distributed Triple Store

The driving idea behind Semantic Web is to provide a web-scale information sharing model and platform.  One of the singular advancements over the past several years in the Semantic Web domain has been the explosion of data available in semantic formats

Read more

Hadoop Default Ports Quick Reference

Categories: General Hadoop

Editor’s note (Oct. 3, 2013): The information below is now deprecated. We recommend that you consult this documentation for ports info instead.

Is it 50030 or 50300 for that JobTracker UI? I can never remember!

Hadoop’s daemons expose a handful of ports over TCP. Some of these ports are used by Hadoop’s daemons to communicate amongst themselves (to schedule jobs, replicate blocks, etc.). Others ports are listening directly to users,

Read more

Configuring Eclipse for Apache Hadoop Development (a screencast)

Categories: Data Ingestion General HDFS Training

Update (added 5/15/2013): The information below is dated; see this post for current instructions about configuring Eclipse for Hadoop contributions.

One of the perks of using Java is the availability of functional, cross-platform IDEs.  I use vim for my daily editing needs, but when it comes to navigating, debugging, and coding large Java projects, I fire up Eclipse.

Typically, when you’re developing Map-Reduce applications,

Read more

Hadoop Metrics

Categories: Hadoop

Hadoop’s NameNode, SecondaryNameNode, DataNode, JobTracker, and TaskTracker daemons all expose runtime metrics. These are handy for monitoring and ad-hoc exploration of the system and provide a goldmine of historical data when debugging.

In this post, I’ll first talk about saving metrics to a file.  Then we’ll walk through some of the metrics data.  Finally, I’ll show you how to configure sending metrics to other systems and explore them with jconsole.

Read more