Category Archives: How-to

How-to: Install Cloudera Navigator Encrypt 3.7.0 on SUSE 11 SP2 and SP3

Categories: How-to

Installing Cloudera Navigator Encrypt on SUSE is a one-off process, but we have you covered with this how-to.

Cloudera Navigator Encrypt, which is integrated with Cloudera Navigator governance software, provides massively scalable, high-performance encryption for critical Apache Hadoop data. It leverages industry-standard AES-256 encryption and provides a transparent layer between the application and filesystem. Navigator Encrypt also includes process-based access controls, allowing authorized Hadoop processes to access encrypted data,

Read More

How-to: Translate from MapReduce to Apache Spark (Part 2)

Categories: How-to MapReduce Spark

The conclusion to this series covers Combiner-like aggregation functionality, counters, partitioning, and serialization.

Apache Spark is rising in popularity as an alternative to MapReduce, in a large part due to its expressive API for complex data processing. A few months ago, my colleague, Sean Owen wrote a post describing how to translate functionality from MapReduce into Spark, and in this post, I’ll extend that conversation to cover additional functionality.

Read More

How-to: Install Hue on a Mac

Categories: How-to Hue

Learn how to set up Hue, the open source GUI that makes Apache Hadoop easier to use, on your Mac.

You might have already all the prerequisites installed but we are going to show how to start from a fresh Yosemite (10.10) install and end up with running Hue on your Mac in almost no time!

We are going to be using the official Quickstart VM from Cloudera that already packs all the Apache Hadoop ecosystem components your Hue will talk to.

Read More

How-to: Tune Your Apache Spark Jobs (Part 2)

Categories: How-to Spark

In the conclusion to this series, learn how resource tuning, parallelism, and data representation affect Spark job performance.

In this post, we’ll finish what we started in “How to Tune Your Apache Spark Jobs (Part 1)”. I’ll try to cover pretty much everything you could care to know about making a Spark program run fast. In particular, you’ll learn about resource tuning, or configuring Spark to take advantage of everything the cluster has to offer.

Read More

How-to: Quickly Configure Kerberos for Your Apache Hadoop Cluster

Categories: How-to QuickStart VM Security

Use the scripts and screenshots below to configure a Kerberized cluster in minutes.

Kerberos is the foundation of securing your Apache Hadoop cluster. With Kerberos enabled, user authentication is required. Once users are authenticated, you can use projects like Apache Sentry (incubating) for role-based access control via GRANT/REVOKE statements.

Taming the three-headed dog that guards the gates of Hades is challenging, so Cloudera has put significant effort into making this process easier in Hadoop-based enterprise data hubs. 

Read More