The Cloudera QuickStart VM is an important platform for learning any Hadoop-related curriculum.
In the Fall 2013 semester, more than 30 NYU graduate students completed the Real-time and Big Data Analytics course at the NYU Courant Institute of Mathematical Sciences, for which I served as instructor.
In this introductory analytics course, students learn the architectures of the Apache Hadoop storage and compute systems (HDFS and MapReduce respectively).
Cloudera Manager lets you add a YARN service in the same way you would add any other Cloudera Manager-managed service.
In Apache Hadoop 2, YARN and MapReduce 2 (MR2) are long-needed upgrades for scheduling, resource management, and execution in Hadoop. At their core, the improvements separate cluster resource management capabilities from MapReduce-specific logic. They enable Hadoop to share resources dynamically between MapReduce and other parallel processing frameworks, such as Cloudera Impala;
One of the first questions Cloudera customers raise when getting started with Apache Hadoop is how to select appropriate hardware for their new Hadoop clusters.
Although Hadoop is designed to run on industry-standard hardware, recommending an ideal cluster configuration is not as easy as delivering a list of hardware specifications. Selecting hardware that provides the best balance of performance and economy for a given workload requires testing and validation. (For example, users with IO-intensive workloads will invest in more spindles per core.)
In this blog post,
We’re very pleased to bring you this guest post from Verisign engineer Benoit Perroud, which is based on his personal experiences with the new “Parcel” binary distribution format in Cloudera Manager 4.5.
Among all the new features released with Cloudera Manager 4.5, Parcel is probably one of the most unnoticed – despite the fact it has the potential to become the administrator’s best friend.
Parcel is a new package format to easily distribute CDH or other custom packages to all nodes in a cluster.
Apache HBase will have a notable profile at ApacheCon Europenext month. Clouderan and HBase committer Lars George has two sessions on the schedule:
- HBase Sizing and Schema Design
Abstract: This talk will guide the HBase novice to consider important details when designing HBase backed storage systems. Examples of schemas are given and discussed, as well as rules of thumb that will help to avoid common traps. With the right knowledge of how HBase works internally,