Tag Archives: configuration

Untangling Apache Hadoop YARN, Part 1: Cluster and YARN Basics

Categories: Hadoop MapReduce YARN

In this multipart series, fully explore the tangled ball of thread that is YARN.

YARN (Yet Another Resource Negotiator) is the resource management layer for the Apache Hadoop ecosystem. YARN has been available for several releases, but many users still have fundamental questions about what YARN is, what it’s for, and how it works. This new series of blog posts is designed with the following goals in mind:

  • Provide a basic understanding of the components that make up YARN
  • Illustrate how a MapReduce job fits into the YARN model of computation.

Read more

YCSB, the Open Standard for NoSQL Benchmarking, Joins Cloudera Labs

Categories: Cloudera Labs HBase Performance

YCSB, the open standard for comparative performance evaluation of data stores, is now available to CDH users for their Apache HBase deployments via new packages from Cloudera Labs.

Many factors go into deciding which data store should be used for production applications, including basic features, data model, and the performance characteristics for a given type of workload. It’s critical to have the ability to compare multiple data stores intelligently and objectively so that you can make sound architectural decisions.

Read more

How-to: Write a Cloud Provider Plugin for Cloudera Director

Categories: Cloud How-to

Cloudera Director 1.5 introduces a new plugin architecture to enable support for additional cloud providers. If you want to implement a plugin to add integration with a cloud provider that is not supported out-of-the-box, or to extend one of the existing plugins, these details will get you started.

As discussed in our previous blog post, the Cloudera Director Service Provider Interface (Cloudera Director SPI) defines a Java interface and packaging standards for Cloudera Director plugins.

Read more

How-to: Secure YARN Containers with Cloudera Navigator Encrypt

Categories: Cloudera Navigator Platform Security & Cybersecurity YARN

Learn how Cloudera Navigator Encrypt bring data security to YARN containers.

With the introduction of transparent data encryption in HDFS, we are now a big step closer toward a secure platform in the Apache Hadoop world. However, there are still gaps in the platform, including how YARN and its applications manage their cache. In this post, I’ll explain how Cloudera Navigator Encrypt fills that particular gap.

Use Case

When a YARN application runs in a cluster it can sometimes spill data to the hard disk,

Read more

How-to: Install Apache Zeppelin on CDH

Categories: General Guest How-to Spark

Our thanks to Karthik Vadla and Abhi Basu, Big Data Solutions engineers at Intel, for permission to re-publish the following (which was originally available here).

Data science is not a new discipline. However, with the growth of big data and adoption of big data technologies, the request for better quality data has grown exponentially. Today data science is applied to every facet of life—product validation through fault prediction,

Read more