New in Cloudera Manager 5.1: Direct Active Directory Integration for Kerberos Authentication

With this new release, setting up a separate MIT KDC for cluster authentication services is no longer necessary.

Kerberos (initially developed by MIT in the 1980s) has been adopted by every major component of the Apache Hadoop ecosystem. Consequently, Kerberos has become an integral part of the security infrastructure for the enterprise data hub (EDH).

Until recently, the preferred architecture was to configure your Hadoop cluster to connect directly to an MIT key distribution center (KDC) for authentication services. However, many enterprises already use Active Directory (which has built-in support for Kerberos) in their environments for authentication. Thus, to use Active Directory with Hadoop, those organizations would typically need to set up their Kerberos KDC for one-way trust with the Active Directory KDC.

This task, however, is not easy; configuring and managing a MIT KDC can involve substantial overhead. Fortunately, Cloudera has long provided the simplest experience for enabling Kerberos on a Hadoop cluster, and with Cloudera Manager 5.1 (download), is now the only vendor that lets users easily integrate with an Active Directory KDC — thereby eliminating the need to have a separate MIT KDC and further reducing complexity and removing opportunities for security misconfiguration. Note that this functionality is in addition to the existing feature in Cloudera Manager for managing clusters with MIT KDC (with and without one-way trust to Active Directory) that has been present for more than three years now.

Cloudera Manager 5.1 has also added a new wizard that takes users through all the steps involved with enabling Kerberos for Hadoop clusters, including generating and deploying Kerberos client configuration (krb5.conf), configuring CDH components, and generating the principals needed by all the processes running in the cluster.

In the remainder of this blog post, you’ll learn how to use that wizard to set up a Kerberized cluster using an Active Directory KDC directly.

Before Starting the Wizard

Cloudera Manager requires an Account Manager user that has privileges to create other accounts in Active Directory. For our example here, presume the existence of an Organizational Unit (OU) where we keep all the accounts needed by the cluster. This also ensures that the Account Manager user doesn’t create any accounts outside of the OU. The OU is called “edhCluster” and Account Manager user is called “edh-account-manager.”

You can use the Active Directory Delegate Control wizard to grant this user permission to create other users by checking the option to “Create, delete and manage user accounts” as shown in the screenshot below.

Creating the Account Manager User

Next step is to install additional packages required for the direct-to-AD setup:

  1. Install OpenLDAP utilities (openldap-clients on RHEL/Centos) on the host of Cloudera Manager server.
  2. Install Kerberos client (krb5-workstation on RHEL/Centos) on all hosts of the cluster.

Now you are ready to enter the wizard that will enable Kerberos for your cluster.

Walking Through the Wizard

The wizard is triggered by going to the Actions menu of your cluster on the Home page of the Cloudera Manager UI as shown below.

Triggering the Enable Kerberos Wizard

The first page of the wizard contains a checklist to inform you about the prerequisites of setting up a Kerberized cluster:

The Welcome Page

The next page has fields to provide information about your KDC. In this example, I’ve selected Active Directory as the KDC type and specified the hostname of the domain controller in KDC Server Host field. Additionally, you’d need to provide the OU ( ou=edhCluster,dc=ent,dc=cloudera,dc=com, for example) where all the accounts will be created and the Kerberos realm you would like to use for the cluster. If you want Cloudera Manager to generate and deploy Kerberos client configs (krb5.conf) on your cluster instead of doing it manually, you should check “Manage krb5.conf” through Cloudera Manager. You can also optionally provide an account prefix that will be added to all the accounts created by Cloudera Manager to easily identify them. In the example below, the prefix is set as edhCluster.

Enter KDC information

The advanced Kerberos client configuration page is next, which you’d typically use if you have a complex setup that involves cross-realm trust. If you are going to have a simple direct-to-Active Directory setup only, you can move on to the next section. 

Advanced krb5.conf configuration (can be skipped for Direct-to-AD setup)

Next, you are going to enter the username and password for the Account Manager user (edh-account-manager) you created in Active Directory before entering the wizard. Cloudera Manager will generate an encrypted keytab with the credentials and use it whenever it needs to create new accounts.

Enter Account Manager credentials

The final piece of information you need to provide is the privileged ports that are needed by HDFS DataNodes in a secure cluster. The wizard recommends defaults that you can use. 

Enter privileged ports used by DataNodes

That’s it! Now Cloudera Manager starts a workflow that would run all the steps involved in enabling Kerberos for the cluster as shown below. Once the workflow is finished, only the users present in Active Directory (in any OU) will be able to authenticate to the cluster.

The final workflow

New Active Directory Accounts

Below is a screenshot of the OU after Cloudera Manager has created all the accounts needed by these processes. As you can see, all the auto-generated accounts have the prefix edhCluster in the display names. One important point to note is that Cloudera Manager sets randomly generated passwords for all the accounts. These passwords are also stored in encrypted keytabs and used when starting a process on the cluster (that is, the passwords themselves are unknown even to Cloudera Manager once the account is created).

Accounts created by Cloudera Manager

Once the initial setup has been done for enabling Kerberos for the cluster, Cloudera Manager will also automatically create any new accounts that are needed when new hosts are added to your cluster or when you’re adding a new service.

Next Steps

Now that you have added Kerberos based authentication to your cluster, you can further add authorizations for the data present in the cluster using Apache Sentry (incubating). Cloudera Manager also supports configuring LDAP group mappings for Hadoop.

Conclusion

As you can see, Cloudera Manager makes direct-to-AD Kerberos setup extremely simple and you can secure your cluster without having to worry about complex MIT KDC configuration and management. Learn more about configuring security using Cloudera Manager in the Security Guide.

Vikram Srivastava is a Software Engineer at Cloudera.

Filed under:

4 Responses
  • Bolke de Bruin / July 30, 2014 / 12:22 AM

    Interesting read. As an alternative approach to this integration I wrote a blog on how to integrate Samba4 Active Directory with CDH. There are some pro’s and cons to both approaches. Such as the creation of the principals is nicer in the above article (although with delegation we can probably reach the same thing in a Samba4 environment). On the pro side of our article that you can actually do it with opensource CDH (and beyond that it is not even tight to CDH).

    BTW: I would add sssd integration to this article as it provides the icing on the cake for integration as users and groups are then also known on the local system.

  • Deepak / October 07, 2014 / 11:47 AM

    Very good article, If it requires a separate OU, can this coexist with our corporate OU?

    Thanks
    Deepak

    • Justin Kestelyn (@kestelyn) / October 07, 2014 / 1:02 PM

      Deepak,

      A separate OU is not needed — everything will work without it — but it’s a usability recommendation.

  • Dmitry / October 14, 2014 / 6:56 PM

    Why do I need to use multiple accounts for the different nodes/services? Can I just use the same account for all of these?

Leave a comment


7 − = six