How-to: Deploy a Secure Enterprise Data Hub on Microsoft Azure – Part 1

How-to: Deploy a Secure Enterprise Data Hub on Microsoft Azure – Part 1

 

Learn how to use Cloudera Director, Microsoft Active Directory (AD DS, AD CS, AD DNS), SAMBA, and SSSD to deploy a secure EDH cluster for workloads in the public cloud.

Authenticating users in Apache Hadoop is the first line of security we recommend. Like most, if not all RDBMS, a user is provided with a username and a password to validate their identity. This is a requirement to access any data managed by those systems. The goal is the same in Apache Hadoop. Since the Hadoop stack does not have an authentication component, Kerberos Key Distribution Center is used as the mechanism to identify users.

 There are two implementations of a Kerberos KDC that are supported on a CDH cluster: A MIT KDC installation, and/or integration with Microsoft Active Directory (AD) built-in Kerberos KDC. Generally, the latter is recommended to our enterprise customers and the blog will focus on a direct integration of CDH and the Active Directory KDC. This integration is favored because of other tools that will be used to communicate with Active Directory.   

Active Directory

Active Directory is mainly known for its Domain Service (AD DS) service as an Identity Management service which authenticates users and groups. However, there are other powerful services within AD like AD CS, and AD DNS.

 On May 6, 2016, my colleague, Ben Spivey wrote a blog on securing a cluster on Amazon AWS. He covered a great deal on the AD DS and AD CS services. For more details, Ben’s blog is a good place to start. This blog will spend more time on AD DNS service.

 Active Directory Domain Name System

Deploying a CDH cluster requires both forward and reverse name resolution for internal IP addresses. When deploying a cluster on-premises, this is usually done by your system administrator. When you deploy a cluster on Amazon AWS, this is automatically configured when you launch an EC2 instance.

A forward DNS lookup is resolving a Fully Qualified Domain Name (FQDN) to an IP address, and a reverse DNS lookup is doing the opposite, resolving an IP address to a FQDN. Currently, Microsoft Azure does not provide reverse DNS lookup for internal private IP addresses. This will be covered later.

 There are many options for DNS when deploying on Azure. You can install the supported BIND package for your Linux OS, an existing Active Directory Domain Name System, etc. This blog will cover the AD DNS in more details.

If not already configured, ensure your AD administrator has properly configured a reverse DNS zone in the DNS Manager as seen below.

Reverse Zone

DNS Manager

The important section in the figure above, is the red box in the “Reverse Lookup Zones”. This illustrates the zone configured to host all the DNS objects for a particular subnet.

Forward Zone

DNS Manager

This is a view of the “Forward Lookup Zones” for the CLOUDERA.MORANTUS.COM domain.

Active Directory Users and Computers

Also a view of my OU tree showing zero entries

Azure Virtual Machine

I provisioned a VM in Azure with all the default DNS settings, and we will join it to our AD DS and DNS services.

DNS ServicesAs you can see, the hostname -f command displays a very long FQDN for my VM and hostname -i gives us the IP address associated with the VM. Next, I did a forward DNS lookup using the host FQDN command, which resolved to the IP address. Then, I did a reverse DNS lookup using host IPaddress as shown in the red box above, it did not locate a reverse entry for that IP address. A reverse lookup is a requirement for a CDH deployment. We’ll revisit this later.

SAMBA

In order to configure our RHEL 6.7 VM to communicate with Active Directory, we need to configure a tool called samba. Samba is a Linux based utility that enables the integration of Linux systems with AD.

Join the VM to AD with Samba

  1.  Ensure the DNS servers property for your Virtual Network in the Azure portal is pointed to your AD server.


Azure Portal

  1.  Install packages needed to integrate with AD
sudo yum install -y samba-common krb5-workstation openldap-clients

 

  1.  Configure the VM to point to the AD DNS server

AD DNS Server

The nameserver is the IP address for the AD server. This can also be accomplished by running “service network restart” on the VM

  1.  Configure samba to join the AD domain and verify the entry in AD. This must be executed as a privileged user. In this case “jmorantus” is an admin account in Active Directory.

AD DNS ServerNote: You can ignore the failed DNS update error showed above. We need to create a Kerberos keytab with a privileged account to update/create DNS objects in AD. This step will be executed later.

DNS Objects

As you can above, we succeeded joining our VM to the AD domain and an AD object was created in the OU servers.

  1.  Configure Kerberos krb5.conf file to generate keytab file to update DNS in ADKeytab File Generation
  2.  Update/Create Forward and Reverse DNS entries

Create/Update Forward and Reverse DNS Settings

View of Forward DNS entry added to AD DNS service

View of Forward DNS entry added to AD DNS service

 

View of reverse DNS entry added to AD DNS service.

Note: it’s worth mentioning that Active Directory will age DNS entries that it considers “inactive”. An additional process should be implemented to keep these entries “alive” in AD.

SSSD

The System Security Service Daemon is used to cache users and groups information locally to a Linux system. This integration is also necessary to configure authorization with Apache Sentry for data access.

Now that SSSD is fully configured, we’ll verify we can read user information from AD.

Read User Information in AD

 

Here you can see with SSSD stopped, the VM does not know of user “scm-cloudera”. With SSSD running, the user information was pulled from AD. If you are looking for a commercial option, Cloudera also recommends Centrify.

Conclusion

You should now be able to configure a VM on Azure, join an AD domain, and create DNS entries in AD DNS server. These steps will work for any other cloud provider and on-premise deployments. In Part 2 of this series, we’ll cover creating a Kerberized cluster with Cloudera Director on Azure.

Leave a comment

Your email address will not be published. Links are not permitted in comments.