Category Archives: How-to

How-to: Manage Permissions in Hue

Categories: How-to Hue MapReduce Oozie

Hue is a web interface for Apache Hadoop that makes common Hadoop tasks such as running MapReduce jobs, browsing HDFS, and creating Apache Oozie workflows, easier. (To learn more about the integration of Oozie and Hue, see this blog post.) In this post, we’re going to focus on how one of the fundamental components in Hue, Useradmin, has matured.

New User and Permission Features

User and permission management in Hue has changed drastically over the past year.

Read More

Analyzing Twitter Data with Apache Hadoop, Part 3: Querying Semi-structured Data with Apache Hive

Categories: CDH Hadoop Hive How-to Use Case

This is the third article in a series about analyzing Twitter data using some of the components of the Apache Hadoop ecosystem that are available in CDH (Cloudera’s open-source distribution of Apache Hadoop and related projects). If you’re looking for an introduction to the application and a high-level view, check out the first article in the series.

In the previous article in this series, we saw how Flume can be utilized to ingest data into Hadoop.

Read More

Analyzing Twitter Data with Apache Hadoop, Part 2: Gathering Data with Flume

Categories: CDH Flume Hadoop How-to Oozie Use Case

This is the second article in a series about analyzing Twitter data using some of the components of the Hadoop ecosystem available in CDH, Cloudera’s open-source distribution of Apache Hadoop and related projects. In the first article, you learned how to pull CDH components together into a single cohesive application, but to really appreciate the flexibility of each of these components, we need to dive deeper.

Every story has a beginning,

Read More

How-to: Set Up an Apache Hadoop/Apache HBase Cluster on EC2 in (About) an Hour

Categories: CDH Cloud Cloudera Manager How-to

Note (added July 8, 2013): The information below is deprecated; we suggest that you refer to this post for current instructions.

Today we bring you one user’s experience using Apache Whirr to spin up a CDH cluster in the cloud. This post was originally published here by George London (@rogueleaderr) based on his personal experiences; he has graciously allowed us to bring it to you here as well in a condensed form.

Read More

How-to: Enable User Authentication and Authorization in Apache HBase

Categories: HBase How-to Platform Security & Cybersecurity

With the default Apache HBase configuration, everyone is allowed to read from and write to all tables available in the system. For many enterprise setups, this kind of policy is unacceptable. 

Administrators can set up firewalls that decide which machines are allowed to communicate with HBase. However, machines that can pass the firewall are still allowed to read from and write to all tables.  This kind of mechanism is effective but insufficient because HBase still cannot differentiate between multiple users that use the same client machines,

Read More