How-to: Secure YARN Containers with Cloudera Navigator Encrypt

Categories: Cloudera Navigator Platform Security & Cybersecurity YARN

Learn how Cloudera Navigator Encrypt bring data security to YARN containers.

With the introduction of transparent data encryption in HDFS, we are now a big step closer toward a secure platform in the Apache Hadoop world. However, there are still gaps in the platform, including how YARN and its applications manage their cache. In this post, I’ll explain how Cloudera Navigator Encrypt fills that particular gap.

Use Case

When a YARN application runs in a cluster it can sometimes spill data to the hard disk, giving potential attackers an opportunity to take a look at sensitive data.

Using Cloudera Navigator Encrypt, you can fill that gap by using encryption and ACL rules to enable access solely to YARN, thereby providing a secure folder where containers can write/read data.

For this example, we will use a CDH 5.4.0 and Apache Spark on YARN setup using parcels with Kerberos enabled. This procedure uses Spark to demonstrate the setup; however, it will protect all the jobs executed under YARN.

Note that although you could create either an ecryptfs or dmcrypt mount point to protect the containers, this setup focuses on the latter.

The first step is to create all the mount points in each of the servers with the YARN role NodeManager. It’s important to note that you will need a disk that will be used exclusively for container encryption. Thus, you could use an available disk, or if you just want to try it out, you can create a loop device and use that for testing purposes.

Setting up Cloudera Navigator Encrypt

We will assume that Cloudera Navigator Encrypt is already set up and registered in a Key Trustee Server.

First, create the mount points that will use to save the data you want to protect. Execute this command across all YARN NodeManagers.

Stop the cluster.

Now that you have set up the mount points, it’s time to encrypt the data. The default directory for YARN containers can be found in the YARN configuration as yarn.nodemanager.local-dirs (one or multiple directories). In this setup, there are three disks so you would see these entries:

Start the process by encrypting the folders using a descriptive category:

After that, verify with a small test that the ACLs are in place and working as expected. To do that, first put Cloudera Navigator Encrypt into permissive mode on all nodes.

Start the cluster once again and execute the first job. In this case, you will use a simple word count and specify in the job configuration that everything should be written to disk as a normal text file. (In this scenario, you have a file with a collection of random words but any text file will suffice.)

Here’s the pyspark script that was executed:

Example output :

  • Don’t”), (900, u’this’)]
    15/05/20 16:23:31 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
    15/05/20 16:23:32 INFO SparkUI: Stopped Spark web UI at
    15/05/20 16:23:32 INFO DAGScheduler: Stopping DAGScheduler
    15/05/20 16:23:32 INFO YarnClientSchedulerBackend: Shutting down all executors
    15/05/20 16:23:32 INFO YarnClientSchedulerBackend: Asking each executor to shut down
    15/05/20 16:23:32 INFO YarnClientSchedulerBackend: Stopped
    15/05/20 16:23:32 INFO OutputCommitCoordinator$OutputCommitCoordinatorActor: OutputCommitCoordinator stopped!
    15/05/20 16:23:32 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
    15/05/20 16:23:32 INFO MemoryStore: MemoryStore cleared
    15/05/20 16:23:32 INFO BlockManager: BlockManager stopped
    15/05/20 16:23:32 INFO BlockManagerMaster: BlockManagerMaster stopped
    15/05/20 16:23:32 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
    15/05/20 16:23:32 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
    15/05/20 16:23:32 INFO SparkContext: Successfully stopped SparkContext
    15/05/20 16:23:32 INFO Remoting: Remoting shut down
    15/05/20 16:23:32 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.

Now, verify the dmesg output in each node with NavenCrypt, and confirm that the encrypted directory is being accessed.

So, for the final step, add YARN’s NodeManager profile to the ACL on each node with these steps:

(Note: Cloudera Navigator Encrypt works differently when creating ACLs for Java processes because the binary executed is the Java executable and Java can receive different jars. In that case, you need to specify a profile, which contains all the options that Java receives when it gets executed. Using that profile, you can set which java application will access the data.)

You need access to the jps tool (included in the JDK) to use this script. The Java that you are going to add in the ACL is the same you see in the dmesg warnings; if you add any other executable, the ACL will fail.

Finally, clean up the dmesg in each node and execute the jobs. After that, you should still have a clean dmesg, and you can just put Cloudera Navigator Encrypt back into enforcing mode on all the NodeManager nodes.

Note that you will need to add a new ACL rule every time you update YARN (the profile will likely change, including possible configuration changes) or the JDK version you use. This is because the Java binary and the profile tied to it will change–and thus the ACL will no longer match. That process is as simple as adding a new rule pointing to the new JDK version and executing the navencypt-profile command to retrieve the process profile.

Congratulations! Your YARN containers are now secure.


Using Cloudera Navigator Encrypt to secure data on rest is easy and fast, it can be used to protect sensitive data by encrypting it and creating ACLs, complementing your security infrastructure and application needs.

Mario Lopez is a Software Engineer at Cloudera.