How-to: Quickly Configure Kerberos for Your Apache Hadoop Cluster

Categories: How-to Platform Security & Cybersecurity QuickStart VM

Use the scripts and screenshots below to configure a Kerberized cluster in minutes.

Kerberos is the foundation of securing your Apache Hadoop cluster. With Kerberos enabled, user authentication is required. Once users are authenticated, you can use projects like Apache Sentry (incubating) for role-based access control via GRANT/REVOKE statements.

Taming the three-headed dog that guards the gates of Hades is challenging, so Cloudera has put significant effort into making this process easier in Hadoop-based enterprise data hubs. In this post, you’ll learn how to stand-up a one-node cluster with Kerberos enforcing user authentication, using the Cloudera QuickStart VM as a demo environment.

If you want to read the product documentation, it’s available here. You should consider this reference material; I’d suggest reading it later to understand more details about what the scripts do.

Requirements

You need the following downloads to follow along.

Initial Configuration

Before you start the QuickStart VM, increase the memory allocation to 8GB RAM and increase the number of CPUs to two. You can get by with a little less RAM, but we will have everything including the Kerberos server running on one node.

Start up the VM and activate Cloudera Manager as shown here:

Give this script some time to run, it has to restart the cluster.

KDC Install and Setup Script

The script goKerberos_beforeCM.sh does all the setup work for the Kerberos server and the appropriate configuration parameters. The comments are designed to explain what is going on inline. (Do not copy and paste this script! It contains unprintable characters that are pretending to be spaces. Rather, download it.)

Cloudera Manager Kerberos Wizard

After running the script, you now have a working Kerberos server and can secure the Hadoop cluster. The wizard will do most of the heavy lifting; you just have to fill in a few values.

To start, log into Cloudera Manager by going to http://quickstart.cloudera:7180 in your browser. The userid is cloudera and the password is cloudera. (Almost needless to say but never use “cloudera” as a password in a real-world setting.)

There are lots of productivity tools here for managing the cluster but ignore them for now and head straight for the Administration > Kerberos wizard as shown in the next screenshot.

Click on the “Enable Kerberos” button.

The four checklist items were all completed by the script you’ve already run. Check off each item and select “Continue.”

The Kerberos Wizard needs to know the details of what the script configured. Fill in the entries as follows:

  • KDC Server Host: quickstart.cloudera
  • Kerberos Security Realm: CLOUDERA
  • Kerberos Encryption Types: aes256-cts-hmac-sha1-96

Click “Continue.”

Do you want Cloudera Manager to manage the krb5.conf files in your cluster? Remember, the whole point of this blog post is to make Kerberos easier. So, please check “Yes” and then select “Continue.”

The Kerberos Wizard is going to create Kerberos principals for the different services in the cluster. To do that it needs a Kerberos Administrator ID. The ID created is: cloudera-scm/admin@CLOUDERA.

The screen shot shows how to enter this information. Recall the password is: cloudera.

The next screen provides good news. It lets you know that the wizard was able to successfully authenticate. 

OK, you’re ready to let the Kerberos Wizard do its work. Since this is a VM, you can safely select “I’m ready to restart the cluster now” and then click “Continue.” You now have time to go get a coffee or other beverage of your choice.

How long does that take? Just let it work.

Congrats, you are now running a Hadoop cluster secured with Kerberos.

Kerberos is Enabled. Now What?

The old method of su - hdfs will no longer provide administrator access to the HDFS filesystem. Here is how you become the hdfs user with Kerberos:

Now validate you can do hdfs user things:

Next, invalidate the Kerberos token so as not to break anything:

The min.user parameter needs to be fixed per the message below:

This is the error message you get without fixing min.user.id:

Save the changes shown above and restart the YARN service. Now validate that the cloudera user can use the cluster:

If you forget to kinit before trying to use the cluster you’ll get the errors below. The simple fix is to use kinit with the principal you wish to use.

Congratulations, you have a running Kerberos cluster!

Marty Lurie is a Systems Engineer at Cloudera.

Facebooktwittergoogle_pluslinkedinmailFacebooktwittergoogle_pluslinkedinmail

10 responses on “How-to: Quickly Configure Kerberos for Your Apache Hadoop Cluster

  1. Anand

    Awesome guide for setting up Kerberos! Thanks for posting this detailed step by step instruction!

  2. Bruce Tannenbaum

    Marty, thanks! I’ve spoken to many people about Kerberos for security in their cluster, but I’ve never seen a great tutorial…until now, that is! Thanks for providing this! Excellent work and easily understood!

  3. Jérôme B

    Hello,

    Great tutorial, thanks a lot to share it. I followed it, and I have some small remarks to improve it:
    – If we just follow the tutorial line by line, It’s not specified clearly that the script should be copied into the /root directory. To understand that, we have to read the script (that’s not a bad thing), but I wasted some time to understand that.
    – Service krb5kdc and kadmin are not launched at OS start. So if we restart the VM, Kerberos and all the Cloudera services are not working (due to krb5dc service not started). To make work, I have to start manually service and I have to restart the cluster (I think we should use chkconfig but, I don’t know which runlevel to choose).

  4. FO

    Great guide.
    We would love to have a similar guide that will give a step-by-step tutorial of using an existing KDC as well(for eg. An Active directory KDC, that most of the organizations do). If this CDH VM can be configured to use an existing AD KDC, that will be awesome.

  5. Scott

    I note that the latest QuickStart VM includes a “kerberos” script that does everything listed here. So:
    * Increase to 2 cores / 10GB ram
    * From desktop: Launch Cloudera Enterprise (trial)
    * Download JCE policy files (http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html). The script expects to find /home/cloudera/Downloads/UnlimitedJCEPolicyJDK7.zip
    * From terminal (/home/cloudera): sudo ./kerberos
    * Run the CM Kerberos wizard using the information output by the script. “Manage krb5.conf through Cloudera Manager” and “Yes, I am ready to restart the cluster now.”

    … and you should be good to go.

    Next question would be how to get Sentry running on top of this.

  6. Muthuraja S

    Hi – Thanks for posting this, but I am getting the following error :
    kinit: Cannot contact any KDC for realm ‘CLOUDERA’ while getting initial credentials when doing the step “Import KDC Account Manager Credentials”.
    Please help me out to sort this issue.
    Thanks,
    Muthuraja.S

  7. Raj T

    I use a Cloudera Quickstart VM for trial development. But unfortunately I can’t enable the Cloudera Express and hence Cloudera Manager access is not possible because of the computing resource issues. Is there any way I can configure Kerberos using only scripts through command line? Thanks in advance. Raj

  8. Wayne

    Is there any way to change KDC account manager credentials after enabling Kerberos? I can’t find anywhere in CM to change the properties.