Apache Hadoop Ozone Security – Authentication

Apache Ozone is a distributed object store built on top of Hadoop Distributed Data Store service.  It can manage billions of small and large files that are difficult to handle by other distributed file systems. Ozone supports rich APIs such as Amazon S3, Kubernetes CSI as well as native Hadoop File System APIs. This makes Ozone easily consumable by different kinds of big data workloads such as data warehouse on Apache Hive, data ingestion with Apache Nifi, streaming with Apache Spark/Flink and machine learning with Tensorflow.   

With the growing data footprint and multifaceted workloads that need collaboration between various groups, data security is of utmost importance. Ozone security has been added since the Apache Hadoop Ozone 0.4.0 release with contributions from the communities. It has also been included as a tech preview in Cloudera’s CDP Data Center 7.0 release. Security can be classified into  four building blocks: Authentication, Authorization, Auditing and Encryption. We will cover the Authentication part in this blog along with the remaining in the followup ones.  

Authentication is the process of recognizing a user’s identity for Ozone components. Ozone is compatible with Apache Hadoop security architecture, supporting strong authentication using Kerberos as well as security tokens. 

Kerberos-based Authentication

As shown in Figure 1 below, Service components including OM (Ozone Manager), SCM (Storage Container Manager) and Datandoes are all authenticated with each other via Kerberos. Each service must be configured with valid Kerberos Principal Name and keytab file, which will be used by the service to login upon the start of the service in secure mode. More details on the OM/SCM/Datanode Kerberos configuration can be found in Apache Hadoop Ozone documents. Correspondingly, Ozone clients must provide either a valid Kerberos ticket or security tokens to access Ozone services such as Ozone Manager for metadata and Datanode for read/write blocks. 

Security Tokens

Like Hadoop delegation tokens, Ozone security token has a token identifier along with a signed signature from the issuer. Ozone manager issues delegation token and block tokens for users or client applications authenticated with Kerberos. The signature of the token can be validated by token validators to verify the identity of the issuer. This way, a valid token holder can use the token to perform operations against the cluster services as if they have Kerberos tickets of the issuer. 

Delegation token issued by Ozone Manager allows token holders to access metadata services provided by Ozone Manager such as creating a volume or listing the objects in a bucket. Upon receiving a request from a client with a delegation token, Ozone manager validates the delegation token by checking the signer’s signature via its public key. Delegation token operations such as to get, renew and cancel can only be performed over a Kerberos authenticated connection. 

Block tokens are similar to delegation tokens in a sense that they are issued/signed by the Ozone manager. They are issued by Ozone manager when a client request involves block read or write on Datanode. Unlike delegation tokens requested with explicit get/renew/cancel APIs, they are transparently handed to clients along with the key/block location information. Block tokens are validated by Datanodes when receiving the read/write request from clients using the singer Ozone manager’s public key. Block token can’t be renewed explicitly by client. Once expired, the client must re-fetch the key/block locations to get new block tokens.  

S3 Secret

Ozone supports Amazon S3 protocol via Ozone S3 Gateway. In secure mode, Ozone Manager issues an s3 secret for Kerberos authenticated users or client applications accessing Ozone using S3 APIs. We will cover that in later blogs on Ozone S3 Gateway.

How does Ozone Security Token work?

As shown in Figure 2, the traditional Apache Hadoop delegation token and block token rely on shared secrets between token issuer and token validator to sign and validate token. Therefore, when the issuer and validator are different, e.g., in the case of block token, the shared master key must be periodically transferred over the wire to sync between the token issuer (namenode) and token validator (datanodes).

Instead, Ozone security token takes a certificate-based approach. As shown in Figure 3, it completely decouples the token issuers and token validators with a certificate-based signature. This way, tokens are more secure as shared secrets are never transported over the wire.

In secure mode, SCM bootstraps itself as a CA (Certificate Authority) and creates a self-signed CA certificate. Datanode and Ozone Manager must register with SCM CA via a CSR (certificate signing request). SCM validates the identity of Datanode and Ozone Manager via Kerberos and signs the component’s certificate. The signed certificates are used by Ozone Manager and Datanode to prove its identity. This is especially useful for delegation token/block token signing and validation. 

In the case of block token, Ozone Manager (token issuer) signs the token with its private key and Datanodes (token validator) uses Ozone Manager’s certificate to validate block tokens because both Ozone Manager and datanode trust SCM CA signed certificates. 

In the case of delegation token when Ozone Manager (both token issuer and validator) is running in HA (High Availability)  mode. There are multiple Ozone Manager instances running simultaneously. A delegation token issued and signed by Ozone Manager instance 1 can be validated by Ozone Manager instance 2 when the leader Ozone Manager changes as both instances trust SCM CA signed certificates. More details of Ozone HA design document can be found here.

Conclusion

Authentication is one of the most important building blocks of Apache Hadoop Ozone security. You should now have a better understanding of what authentication mechanisms are supported by Apache Hadoop Ozone and how they work. This will help understanding other Ozone security pillars such as authorization and auditing. 

Stay tuned for follow up articles on Ozone Security Authorization, Audit, Encryption and GDPR. If you are interested in diving deep, you can find more technical details from Ozone security design document.

Reference

[1] Apache Hadoop Ozone Architecture

[2] Benchmarking Ozone: Cloudera’s next generation Storage for CDP

[3] What is Kerberos? · Hadoop and Kerberos: The Madness Beyond the Gate

[4] Apache Hadoop Ozone Document

[5] Adding Security to Apache Hadoop

[6] Apache Hadoop Ozone HA Design Document on HDDS-505. 

[7] Apache Hadoop Ozone Security Design Document on HDDS-4.

Xiaoyu Yao
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.