Accessing Secure Cluster from Web Applications

Categories: CDH Hadoop How-to

As customers use Apache Hadoop clusters in ways other than through HUE and Hadoop Command Line Interface (CLI) and integrate it closely with the applications they develop, we often get asked how to access their secure Hadoop cluster from within the custom applications. Many customers use a service account in their application and access the cluster with a fixed service account. However, other customers would like to access as the end users who have authenticated to the application. In this blog we’ll talk about what options are available and show in detail how to access some Cloudera Hadoop cluster services from within a web application as the end user using Kerberos delegation with Microsoft Active Directory.

The primary form of strong authentication used on a secure cluster is Kerberos. Kerberos supports credentials delegation where a server process to which a user has authenticated, can perform actions on behalf of the user. This involves the server process accessing databases or other web services as the authenticated user. Historically the form of delegation that was supported by Kerberos is now called “full delegation”. In this type of delegation, the Ticket Granting Ticket (TGT) of the user is made available to the server process and server can then authenticate to any service where the user has been granted authorization. Until recently most Kerberos Key Distribution Center(KDC)s other than Active Directory supported only this form of delegation. Also Java until Java 7 supported only this form of delegation. Starting with Java 8, Java now supports Kerberos constrained delegation (S4U2Proxy), where if the KDC supports it, it is possible to specify which particular services the server process can be delegated access to.

Hadoop within its security framework has implemented impersonation or proxy support that is independent of Kerberos delegation. With Hadoop impersonation support you can assign certain accounts proxy privileges where the proxy accounts can access Hadoop resources or run jobs on behalf of other users. We can restrict proxy privileges granted to a proxy account to act on behalf of only certain users who are members of certain groups and/or only for connections originating from certain hosts. However we can’t restrict the proxy privileges to only certain services within the cluster.

What we are discussing in this article is how to setup Kerberos constrained delegation and access a secure cluster. The example here involves Apache Tomcat, however you can easily extend this to other Java Application Servers.

To illustrate an example of how Kerberos constrained delegation works, take an example of a web server that authenticates the user and then run some Impala queries on behalf of the user. The flow is presented in the diagram below.

KDC workflowSince Kerberos constrained delegation is supported in Java 8 and above, the primary requirement is to use Tomcat 7 or above that can run on Java 8. In a web application the common Kerberos Authentication mechanism is SPNEGO. The user credential with which the user authenticated to the web application will them be used to access cluster services. We’ll first configure the web application to authenticate with SPNEGO. In Tomcat, SPNEGO Authentication can be implemented by a SPNEGO JASPIC module or by using the built-in SPNEGO Valve. For simplicity we’ll use the SPNEGO Valve in this example.

Configuring the Account

To begin we first have to create an account/principal in AD, associate a ServicePrincipalName (SPN) with it and create a keytab file for the same. If you are running this on a node in a Hadoop cluster, a principal for the HTTP service may have already been created. If an HTTP principal exists for the server where you are running Tomcat, skip the steps for creating the account, setting a principal and creating the keytab file. You can start with the step of creating constrained delegation. If not, follow the instructions below. Tomcat’s Guide has an example on how to setup and to associate an SPN with an account and create keytab file. After creating the account tc01 in AD, this example shows how to configure the SPN on the account and create the keytab file. The server on which tomcat is running is lnx-tc01.dev.local and the AD KDC realm is DEV.LOCAL. Execute these commands on the AD server:

Set the SPN with the command:

Create the keytab with the command:

Next configure constrained delegations for the account. This should be configured for services that you want the Tomcat process to authenticate on behalf of the users. For example if you want to connect to an Impala instance running on dn1.dev.local then the constrained delegation would be for SPN impala/dn1.dev.local. (Tip: Configure a haproxy or a Load Balancer for impala so that you don’t have to configure constrained delegation for every impala node in the cluster. The constrained need to be configured for only the SPN of the haproxy or Load Balancer). If you want to connect to an Oozie instance running on edge1.dev.local then the SPN configured for delegation would be HTTP/edge1.dev.local. To configure constrained delegation use “Active Directory Users and Computer”, open up the account properties and use either the Delegation tab or the Attributes Editor tab. If Attributes Editor is used then the property to be modified is ‘msDS-AllowedToDelegateTo’. Here is how to use Attribute Editor:

Locate the account in Active Directory Users and Computer and open up the properties and navigate to Attribute Editor tab as shown. (The screenshots are from Active Directory 2008, if you use another version of Active Directory, the GUI controls may slightly vary)Active directory

Next, click on Filter and unselect Show only attributes that have values

Selecting attributes

Now locate msDS-AllowedToDelegateTo and double click.

SettingsIn the multi-valued String editor enter each SPN and click Add, Apply changes when done.

Multi-valued string editorThe account is now configured for Tomcat to use constrained delegation.

Configuring Tomcat

The first step to configuring Tomcat is to copy Unlimited JCE policy files to the $JAVA_HOME/jre/lib/security directory of the JDK/JRE that runs tomcat. Next if TLS is in use on the cluster make sure the CA root certificate is present in $JAVA_HOME/jre/lib/security/jssecacerts. Next create krb5.ini and jaas.conf files. The krb5.ini file is the Kerberos configuration file which is commonly referred to as krb5.conf file on Linux and Unix systems. This file has configuration about the Kerberos realm and the KDCs servers of that realm. This file should be copied to $CATALINA_BASE/conf/krb5.ini.  A sample of the file looks like this:

Next configure the jaas.conf file. The jaas.conf file configures which JAAS Login module to use, which principal/user to use and where the corresponding keytab file is located. This file should be copied to $CATALINA_BASE/conf/jaas.conf. A sample of the file looks like this.

(Tip: Protect the keytab file so that the only user that has access to the file is the user running Tomcat.)

To configure Tomcat we have to setup a JNDI realm using AD to enable authorization for users who have authenticated. While this is strictly not necessary and you can manually configure roles and map the users to the roles, using AD for user to role mapping simplifies authorization management. The relevant configuration for using AD for role mapping looks like below. This goes into server.xml:

By default the Tomcat SPNEGO valve does not do credential delegation. To configure Tomcat to do credential delegation, configure the following in server.xml

Configuring the Web App

The sample web app we’ll use is based on JSP. It’ll perform two functions. It connects to Impala and run queries and connects to Oozie and runs an Oozie job. We’ll use Cloudera JDBC Connectors for Impala to connect to Impala. The version of Cloudera Connector for Impala JDBC driver we need is 2.5.36 or greater. Download the Connector and unzip the jar files for JDBC 4.1. For connecting to Oozie we’ll use Apache HttpComponents. We’ll need HttpComponents version 4.4 or greater.

Create the following directory:

Copy the Connector jar files to spnego-test/WEB-INF/lib. You’ll see that Impala Connector is distributed with HttpComponents version 4.1 jar files. We’ll need to replace three jar files from the HttpComponents package: httpcore, httpclient and commons-codec. Copy these three files to the spnego-test/WEB-INF/lib directory.

Next configure the web.xml to use SPNEGO Authentication and configure the roles that’ll have access to this web app. This file is copied to spnego-test/WEB-INF. The relevant configuration section is shown below:

Next create the JSP files:

hello_spnego.jsp

This JSP is a basic test to see if the SPNEGO authentication works. If you access this JSP from your browser you should see the username you have logged into your computer.

spnego_jdbc.jsp

This JSP allows you to specify conditions to query the table sample_08 in database “default.” The relevant code of this JSP is shown below

This code gets the delegated credential and creates a Subject. The connection to the JDBC will be run within the security context of this subject and the credentials will be passed to the JDBC connection as shown below:

spnego_oozie.jsp

This JSP checks if it can successfully authenticate to Oozie and it allows you to submit a job to Oozie. Oozie is accessed with its REST interface using HttpClient. We set up HTTPClient for SPNEGO authentication and wrap the delegated credentials with KerberosClient class and set is as the credential for the HttpClient.

Package the JSPs, the jar files in a war file and deploy the war to Tomcat.

Demo Time

If you are using IE or Chrome browser on Windows within the corporate network and the host where you run Tomcat is in the same DNS domain as your AD server then you are ready to test. If you are using Firefox or if the DNS of Tomcat host and AD is different then some additional browser configuration may be required. On a Mac make sure you have a valid Kerberos ticket and use Safari browser.

Navigate to http://lnx-tc01.dev.local:8080/spnego-test/hello_spnego.jsp. If you don’t see Hello followed by your username then check the Troubleshooting section before proceeding further.

The next test is for testing Impala. Navigate to http://lnx-tc01.dev.local:8080/spnego-test/impala_jdbc.jsp. This test queries Impala’s sample table salary_08 from the default database. You can enter a query condition such as “salary > 15000 order by salary desc”. Please be aware that this test does not demonstrate the best practices of writing web applications that query data from databases. As this code does not check for the input it is easily vulnerable to SQL injection attacks. The objective of this test is to merely demonstrate how to authenticate to impala with constrained delegation.

The final test is using constrained delegation to authenticate to a web application using SPNEGO. Navigate to URL http://lnx-tc01.dev.local:8080/spnego-test/oozie_spnego.jsp. In this case we are authenticating to Oozie and submitting an Oozie Job. The test will first check the status of Oozie and if it can successfully authenticate, it’ll give you the ability to submit a job. The code will merely submit the job and return the Oozie job id. It can be further enhanced to monitor the Job status and report if the Job ran successfully or not. This example can also be adapted to authenticate to other HTTP services such as WebHDFS/Httpfs etc.

Troubleshooting

If your hello_spnego.jsp test fails, then either the basic configuration of Tomcat SPNEGO Valve is incorrect or the browser needs additional configuration. To confirm if the Tomcat SPNEGO Valve is configured correctly, login to the host where you are running Tomcat and authenticate to AD using kinit. Then run the command

If you don’t see your username in the output html then you have issue with your config, the Tomcat account or keytab used. To test the account, check if you can get a TGT using the keytab by running the command

Next check if you can get a service ticket for the SPN

If both of these commands executes without any error, check your LDAP configuration by running

If all of these tests run without issues check the browser settings.

If you are able to successfully run the hello_spnego.jsp test and either/both impala and Oozie test is failing then the issue could be the configuration of constrained delegation.

To enable further debugging enable Tomcat debug logging by editing $CATALINA_BASE/conf/logging.properties and adding the lines

Then restart tomcat. Check the log file catalina.<date>.log. Hopefully this should give you sufficient information on what is going on. Also check the references for further information.

Further Information

This configuration works when we enable our web app for SPNEGO and have constrained delegation configured. If we cannot enable SPNEGO for our web application but use some other form of strong authentication like SAML or a JASPIC module, we still can use constrained delegation. For this we first need to get an S4U2Self ticket. This is again available in Java 8. We first have to configure the Tomcat service account for Protocol transition. Next we use impersonate method of com.sun.security.jgss.ExtendedGSSCredential class. We have to use impersonate method of com.sun.security.jgss.ExtendedGSSCredential. We start by creating a GSSCredential of the service account of Tomcat, then creating a GSSName of the end user we want the S4U2Self ticket for, followed by casting the service credentials to ExtendedGSSCredential and calling the impersonate method by passing it the GSSName object created.

Here is a code snippet of how this is done.

We can now use creds the same way we used delegated credentials previously.

Conclusion

We hope this gives a good understanding of the choices available to implement delegation to access a secure cluster and guideline to implementing some of these choices.

 

References

 

Facebooktwittergoogle_pluslinkedinmailFacebooktwittergoogle_pluslinkedinmail

7 responses on “Accessing Secure Cluster from Web Applications

  1. SCC

    Thanks for the excellent article. Certainly clears a lot of ambiguity up.
    One question – what do I do if I want to use an existing edge node to run my webapp? This already has an HTTP SPN – but I’m unable to use it without a keytab being generated. Which needs a password. Which I haven’t got as the SPN was generated by CDH!
    Thanks again – really useful article.

  2. Mubashir Kazia

    Hello SSC,
    CM will create a principle and attach an SPN when it is running a certain role on the node. You need to find out what is the role that is using HTTP SPN on your edge node. When the role starts, it’ll copy the keytab to the node. If you have root access you can access the keytab from /var/run/cloudera-scm-agent/process/ directory.

  3. SCC

    Hi Mubashir
    That’s the strange thing – CM puts an HTTP service principal in our AD, but there’s no keytab for it unless we run httpfs on the node as well, which isn’t ideal as this is an edge node. It’s like there’s no way of any other SPNEGO service being able to use the SPN generated by CM. It would be useful to be able to manually deploy the HTTP keytab to any node where there is an equivalent SPN, for use by SPNEGO.
    Thank-You!

  4. Mubashir Kazia

    Hello SCC,
    CM will only create the principal if there a service that needs it. In addition to HTTPFS, HTTP principal is also used by HDFS, OOZIE, Journal node, YARN and a host of other services. It is also possible that in the past there was a service running on your edge node that required the HTTP principal but it has since been decommissioned on that node. CM does not remove the principal from AD if you decommission the service on a node. You could run a service such as HTTPFS temporarily, copy the keytab file and then decommission the service.

  5. SCC

    Thank-you Mubashir that’s really helpful. We’ll give it a go. The edge node is configured for YARN, HDFS, Hive, Impala, Spark / Spark2 gateways, but no service components. It’s always been an edge node – that is, with no services.

  6. Eshika Roy

    Thank you posting this informative blog about accessing secure cluster from web applications. I gained a lot of useful information after reading this post. Please keep on sharing more useful tutorials in the upcoming posts. Looking forward to your next post.

  7. Raja Rangineni

    Hi Mubashir,

    Thanks for sharing this and the intricate details and know-how. Looking forward for more such articles .

    Regards,
    Raja Rangineni.

Leave a Reply

Your email address will not be published. Required fields are marked *

Prove you're human! *