How-to: Deploy a Secure Enterprise Data Hub on AWS

Categories: CDH Cloud How-to Ops and DevOps

Learn how to use Cloudera Director, Microsoft Active Directory, and Centrify Express to deploy a secure EDH cluster for workloads in the public cloud. 

There are several best practices for deploying a secure Apache Hadoop-powered enterprise data hub (EDH) cluster on Amazon Web Services (AWS), including use of Centrify Express for Linux-to-Active Directory host integration and Microsoft Active Directory as the core integration point for identity, authentication, authorization, and public key infrastructure (PKI). In this two-part series, I’ll guide you through those processes specifically.

In Part 1 below, I’ll focus on using Active Directory and Centrify Express. (The information covered here, by the way, applies not only to EDH deployments provisioned using Cloudera Director, but also for traditional deployments on on-premise infrastructure.) Part 2 will follow up by covering the cloud-specific pieces, including AWS fundamentals and in-depth details about Cloudera Director cluster provisioning.

Active Directory

Active Directory is used heavily by organizations of all types across many industries, and has been for over 15 years. For this reason, it makes sense for your EDH cluster and its underlying Linux environments to repurpose this very popular and time-tested infrastructure. The key integration points that we’ll focus on here are Kerberos, LDAP, and PKI certificates. (Note: The instructions in this section apply to any type of EDH deployment, whether bare metal/on premise or in a public cloud.)

Active Directory Certificate Services

A key step for enabling security across an EDH cluster is to issue server certificates for each server in the cluster, which then will be used to enable TLS across the various components in the stack. Certainly, you can do that using self-signed certificates or even creating your own certificate authority using OpenSSL tools. However, as mentioned previously, using Active Directory security integration points is much more efficient, and allows you the convenience of using Centrify Express utilities as well (described later).

Active Directory has a component called Active Directory Certificate Services (ADCS), which have the purpose of creating or extending PKI that is based on certificates. Using ADCS, you can create a root certificate authority or a subordinate (intermediate) certificate authority. For the purposes of simplicity here, we will be setting up the domain controller as the former (AKA a single-tier PKI hierarchy). However, note that in production environments, it is a security best practice to rather create an offline root certificate authority and an online intermediate certificate authority that issues certificates (AKA a multi-tier PKI hierarchy).

Refer to the docs for ADCS setup; after setup is complete, you will need to create an auto-enrollment certificate template for Centrify Express to use later.

Auto-Enrollment Certificate Template

When using ADCS as the certificate authority, a proper certificate signing template is necessary to ensure that certificates issued to the cluster hosts support all the features required by the components in the stack. In particular, the following certificate properties are necessary:

  • Server authentication
  • Client authentication
  • Key encipherment
  • Digital signature

The above properties are satisfied by the built-in “Computer” certificate template. However, built-in templates cannot be enabled for auto-enrollment; instead, a copy of the Computer template can be made and further modified. You can create that copy using the Certificate Templates snap-in in the Microsoft Management Console (MMC).

From the Console Root > Certificate Templates pane, right-click on “Computer” and click “Duplicate Template.”


After duplicating the Computer template, modify the template by giving it an appropriate name (in our example, centrify). Also, set the “Subject Name Format” to “Common Name” and uncheck “DNS Name.” (Note: While the use of DNS names in the “Subject Alternative Name” is a common practice, it is not universally supported across Cloudera Enterprise. The legacy Common Name format is required, however.)

After setting the template properties, modify the Security tab by adding at least one user or group with “autoenroll” permissions. For this example, the centrify user will have those permissions.

Upon completion of the certificate template creation, you need to enable the template for use in the certificate authority. To do that, add the template to the Certificate Templates section of the Certificate Authority by using the Certification Authority snap-in in MMC.

Click “Action > New > Certificate Template to Issue.”



To use the Cloudera Manager Kerberos wizard with Active Directory as the KDC type, the domain controller(s) must support LDAP over TLS/SSL (LDAPS). This requirement exists because when Cloudera Manager is creating the various service accounts in Active Directory, it is also setting a (randomized) password. Active Directory requires either LDAPS or SASL QoP for the connections where passwords are set. This approach makes sense because sending plain-text passwords over the wire in the clear is a major security vulnerability. (As of release 5.7, Cloudera Manager only supports LDAPS, with SASL QoP support coming in a future release.)

Whether deployed on premise or in the public cloud, EDH clusters can be Kerberized using a Cloudera Manager wizard. (For more information about the Cloudera Manager Kerberos wizard, see the docs.) In addition to being needed for the Kerberos wizard, LDAPS is required for LDAP authentication for other components such as Cloudera Manager, Cloudera Navigator, Hue, and Apache Impala (incubating) (again, to secure plain-text passwords sent over the network).

The good news is that simply setting up a domain controller as a certificate authority will enable LDAPS. However, a reboot of the domain controller is required after the certificate authority is set up for LDAPS to take effect, even though the Active Directory wizard does not say the server needs to be restarted.

Users, Groups, and Organization Units

Now that you’ve set up the required pieces to support certificates and secured the channel for LDAP communication, next I’ll cover the LDAP structure itself and the types of objects needed to support the secure EDH cluster.

LDAP organizational units (OUs) are logical containers that contain other LDAP objects including groups, users, or even other OUs%mdash;thus creating a nested hierarchy of objects. The below table describes the layout of the containers and each container’s intended purpose.


Below is a screenshot showing the layout of these containers in Active Directory.


Moving past the structure of containers in LDAP, groups play an important role in organizing the users of the EDH environment into groups that align to some kind of business function or role.  The table below lists all the groups we will be using in our EDH environment, and describes each group’s purpose as it relates to security.


Finally, beyond the end users who will be accessing EDH, there are a few “special” users that have specific functions in the environment. They are listed in the below table, along with what special accesses they might require.


Using Centrify Express

Now that all the Active Directory steps have been completed, we need a way to integrate all of the cluster hosts with Active Directory to support the features and functions required. This step includes Kerberos authentication, LDAP group lookups, and obtaining server certificates for the hosts in the cluster.

Centrify Express provides an easy way to integrate Linux hosts with Active Directory. (A free version is available, with a more robust enterprise one available via subscription.) As with the previous Active Directory section, the following Centrify Express instructions apply to bare-metal on-premise deployments as well as public-cloud ones.

Using adjoin

A key component of Centrify Express is the adjoin utility, which offers many parameters for customizing how an individual Linux host will join to an Active Directory domain. After successfully joining, several important steps have been completed on the Linux host operating system:

  1. Kerberos client configuration is set up to authenticate against Active Directory.
  2. LDAP integration is set up such that group lookups by the system query Active Directory.
  3. SSH authentication is done with Kerberos against Active Directory. Successful logins by users will automatically create a Kerberos TGT in the user’s ticket cache.  This can be used to further access Kerberos-protected services, including an EDH cluster.

An important configuration to be done before using the adjoin command is to modify the centrify configuration file /etc/centrifydc/centrifydc.conf to remove the http principal from the list of service principals that are created upon join. The similarly named HTTP principal is needed and created by Cloudera Manager during a later step. The replacement step is done using sed as follows:

In addition, we need to disable Centrify from synchronizing the system time with the Active Directory domain controller because the EDH cluster will rely on ntp for that, with Cloudera Manager monitoring the process. To do that, use sed as follows:

After completing the above configuration modifications, run the adjoin command to join the host to the Active Directory domain. The command is:

The parameters are as follows:

  • -u – The user in Active Directory to use to join the host to the domain (must have the right privileges)
  • -p – The password for the above user
  • -c – The OU container where the actual Computer objects will be created in Active Directory
  • -w – The Active Directory domain the host will be joined to
  • --prewin2k – The pre-Windows2000 hostname for the host; necessary because some default EC2 hostnames are too long to use

Using adcert

After joining the host to the Active Directory domain, the next step is to obtain a server certificate from the certificate authority. As described previously, the certificate authority is managed by ADCS.

Centrify includes a utility called adcert for interacting with ADCS to obtain a certificate for the host. The command is:

The parameters are as follows:

  • -e – Option to indicate enrolling a new certificate
  • -n – The certificate authority name, which comes from the CA’s certificate
  • -s – The server name that ADCS is running on and can fulfill the certificate enrollment request
  • -t – The certificate template to use to sign the certificate, which must support auto-enrollment

Upon completion, the adcert utility places the certificate-related artifacts in the /var/centrify/net/certs directory on the host. The files in that directory are:

  • cert.key – The private key for the server certificate, in PEM format
  • cert.cert – The public certificate for the server, in PEM format
  • cert.chain – The certificate chain for the public certificate, which contains all CA certs in a multi-tiered PKI, or just the root CA in a single tier PKI

These files will be used as the foundation to create all the objects needed to support enabling TLS across the various components in the EDH.


At this point, you have learned how to configure and utilize Active Directory and Centrify Express for securing an EDH cluster, whether on premise or in the public cloud. Next, in Part 2, I will cover configurations pertaining to Cloudera Director and AWS specifically.

Ben Spivey is a Principal Solutions Architect at Cloudera, and a co-author of the O’Reilly Media book, Hadoop Security.


4 responses on “How-to: Deploy a Secure Enterprise Data Hub on AWS

  1. Pete

    the link to centrify express points at the cloudera director docs – can you provide a link to centry docs pls?

      1. Pete

        it’s probably also worth pointing out the adcert command is not supported by Centrify Express

        1. Ben Spivey

          Actually, I only used Centrify Express and adcert worked just fine. If you truly mean “supported” in the true support sense, nothing about Centrify Express would be supported since it is the free version.