Author Archives: Alejandro Abdelnur

How-to: Set Up a Hadoop Cluster with Network Encryption

Categories: CDH Hadoop How-to Security

Hadoop network encryption is a feature introduced in Apache Hadoop 2.0.2-alpha and in CDH4.1.

In this blog post, we’ll first cover Hadoop’s pre-existing security capabilities. Then, we’ll explain why network encryption may be required. We’ll also provide some details on how it has been implemented. At the end of this blog post, you’ll get step-by-step instructions to help you set up a Hadoop cluster with network encryption.

A Bit of History on Hadoop Security

Starting with Apache Hadoop 0.20.20x and available in Hadoop 1 and Hadoop 2 releases (as well as CDH3 and CDH4 releases),

Read More

HttpFS for CDH3 – The Apache Hadoop FileSystem over HTTP

Categories: CDH General HDFS

HttpFS is an HTTP gateway/proxy for Apache Hadoop FileSystem implementations. HttpFS comes with CDH4 and replaces HdfsProxy (which only provided read access). Its REST API is compatible with WebHDFS (which is included in CDH4 and the upcoming CDH3u5).

HttpFs is a proxy so, unlike WebHDFS, it does not require clients be able to access every machine in the cluster. This allows clients to to access a cluster that is behind a firewall via the WebHDFS REST API.

Read More

Apache Oozie (incubating) 3.2.0 release

Categories: General Oozie

This blog was originally posted on the Apache Blog for Oozie.

In June 2012, we released Apache Oozie (incubating) 3.2.0. Oozie is currently undergoing incubation at The Apache Software Foundation (see

Oozie is a workflow scheduler system for Apache Hadoop jobs. Oozie Workflows are Directed Acyclical Graphs (DAGs), and they can be scheduled to run at a given time frequency and when data becomes available in HDFS.

Read More

Hoop – Hadoop HDFS over HTTP

Categories: Community HDFS

What is Hoop?

Hoop provides access to all Hadoop Distributed File System (HDFS) operations (read and write) over HTTP/S.

Hoop can be used to:

  • Access HDFS using HTTP REST.
  • Transfer data between clusters running different versions of Hadoop (thereby overcoming RPC versioning issues).
  • Access data in a HDFS cluster behind a firewall. The Hoop server acts as a gateway and is the only system that is allowed to go through the firewall.

Read More

Introducing Alfredo, Kerberos HTTP SPNEGO for Java

Categories: Community Hadoop Security

What is Kerberos & SPNEGO?

Kerberos is an authentication protocol that provides mutual authentication and single sign-on capabilities.

SPNEGO is a plain text mechanism for negotiating authentication protocols between peers; one notable application of this is Kerberos authentication over HTTP.

What is Alfredo?

Alfredo is an Open Source Java library providing support for Kerberos HTTP SPNEGO authentication.

Read More