Tag Archives: configuring hadoop

NYU, Analytics, and Cloudera’s QuickStart VM

Categories: Hadoop QuickStart VM Training

The Cloudera QuickStart VM is an important platform for learning any Hadoop-related curriculum.

In the Fall 2013 semester, more than 30 NYU graduate students completed the Real-time and Big Data Analytics course at the NYU Courant Institute of Mathematical Sciences, for which I served as instructor.

In this introductory analytics course, students learn the architectures of the Apache Hadoop storage and compute systems (HDFS and MapReduce respectively).

Read More

Migrating to MapReduce 2 on YARN (For Operators)

Categories: General Hadoop MapReduce YARN

Cloudera Manager lets you add a YARN service in the same way you would add any other Cloudera Manager-managed service.

In Apache Hadoop 2, YARN and MapReduce 2 (MR2) are long-needed upgrades for scheduling, resource management, and execution in Hadoop. At their core, the improvements separate cluster resource management capabilities from MapReduce-specific logic. They enable Hadoop to share resources dynamically between MapReduce and other parallel processing frameworks, such as Cloudera Impala;

Read More

How-to: Select the Right Hardware for Your New Hadoop Cluster

Categories: Hadoop Hardware How-to Performance Use Case

One of the first questions Cloudera customers raise when getting started with Apache Hadoop is how to select appropriate hardware for their new Hadoop clusters.

Although Hadoop is designed to run on industry-standard hardware, recommending an ideal cluster configuration is not as easy as delivering a list of hardware specifications. Selecting hardware that provides the best balance of performance and economy for a given workload requires testing and validation. (For example, users with IO-intensive workloads will invest in more spindles per core.)

In this blog post,

Read More

One Engineer’s Experience with Parcel

Categories: Cloudera Manager

We’re very pleased to bring you this guest post from Verisign engineer Benoit Perroud, which is based on his personal experiences with the new “Parcel” binary distribution format in Cloudera Manager 4.5.

Among all the new features released with Cloudera Manager 4.5, Parcel is probably one of the most unnoticed – despite the fact it has the potential to become the administrator’s best friend.

Parcel is a new package format to easily distribute CDH or other custom packages to all nodes in a cluster.

Read More

HBase at ApacheCon Europe 2012

Categories: Community HBase

Apache HBase will have a notable profile at ApacheCon Europenext month. Clouderan and HBase committer Lars George has two sessions on the schedule:

  • HBase Sizing and Schema Design
    Abstract: This talk will guide the HBase novice to consider important details when designing HBase backed storage systems. Examples of schemas are given and discussed, as well as rules of thumb that will help to avoid common traps. With the right knowledge of how HBase works internally,

Read More