Protecting Hadoop Clusters From Malware Attacks

Posted in Technical | November 01, 2018 5 min read

Two new strains of malware–XBash and DemonBot–are targeting Apache Hadoop servers for Bitcoin mining and DDOS purposes. This malware is scanning the internet so vigorously for Hadoop clusters that an infection can occur within minutes of an insecure cluster being placed on the open internet. This blog post describes the mechanism this malware uses and offers specific actions to protect your Hadoop-based clusters.

A History of Hadoop Malware

Roughly two years ago there were a spate of attacks against the open source database solution MongoDB, as well as Hadoop. These attacks were ransomware: the attacker wiped or encrypted data and then demanded money to restore that data. Just like the recent attacks, the only Hadoop clusters affected were those that were directly connected to the internet and had no security features enabled. Cloudera published a blog post about this threat in January 2017. That blog post laid out how to ensure that your Hadoop cluster is not directly connected to the internet and encouraged the reader to enable Cloudera’s security and governance features.

That blog post has renewed relevance today with the advent of XBash and DemonBot.

The origin story of XBash and DemonBot illustrates how security researchers view the Hadoop ecosystem and the lifecycle of a vulnerability. Back in 2016 at the Hack.lu conference in Luxembourg, two security researchers gave a talk entitled Hadoop Safari: Hunting for Vulnerabilities. They described Hadoop and its security model and then suggested some “attacks” against clusters that had no security features enabled. These attacks are akin to breaking in to a house while the front door is wide open.

Strong Authentication with Kerberos

These attacks are thwarted by a fundamental security capability which has been part of hadoop for many years: strong authentication using Kerberos. Without Kerberos, any user interacting with the cluster can pretend to be any other user. No particular credentials are required, and any user can do anything as any user. It’s akin to a Linux system where everyone knows the root password.

However, in a properly configured cluster, Kerberos is used for authentication. This means that to interact with the cluster, the user must first enter credentials (like a username and a password) to prove that they are who they say they are. This authentication provides the security that users and administrators expect: users have certain abilities in the system, they can’t impersonate others, and only administrators have access to administrative accounts.

Without Kerberos, however, anyone can contact a Hadoop cluster and do all sorts of bad things to it. One example attack the security researchers suggested at Hack.lu was submitting a simple YARN job to execute code on all machines in a cluster. This could be used to get a shell on every machine in that cluster.

Recent Attacks

This brings us to May of this year. Someone on the internet wrote a Metasploit module to perform the attack described in the Hadoop Safari presentation. Metasploit is the most popular framework for exploiting security vulnerabilities. The idea behind Metasploit is that a security researcher could identify a vulnerability, write a metasploit module for it, and then anyone else in the world could run that exploit to test for that vulnerability. This particular module simply takes some payload and runs it as a YARN job, just like the example from the conference.

Then, in September, XBash arrived on the scene. XBash does a lot of nefarious things, and one of those things is to submit bitcoin mining jobs to YARN using the attack from the Metasploit module. Another new piece of malware, DemonBot, uses the same technique to run DDOS attacks from infected Hadoop servers.

We stress here that this attack technique is not sophisticated—it simply steps through an open door. The Hadoop servers targeted are directly connected to the open internet, and they do not have Kerberos authentication enabled. Two years ago, when the first attacks on open Hadoop servers were becoming known, our responding blog post had some simple advice that remains relevant today: make sure your Cloudera installation is only reachable from locations it is intended to be reachable from, and test to ensure that this is true.

Observing a Real-World Attack

In order to see this attack in action, we created a Hadoop cluster using Cloudera Altus. Altus is a cloud service platform with services that enable you to use CDH to analyze and process data at scale within a public cloud infrastructure. While it is simple to use Altus to create secure clusters that are impervious to these attacks, it is possible to set up an Altus cluster that is vulnerable to these attacks.

In Altus, making a cluster that is vulnerable to these attacks means not checking a box labelled Secure Clusters as well as using an AWS security group that permits all incoming traffic from anywhere on the internet. Within minutes of creating such an insecure cluster that is open to the world, we observed the attack in action. The YARN web UI showed many jobs being submitted and running:

About once per minute, DemonBot attempted to exploit the cluster. Fortunately for us, these attacks failed because of a misconfiguration in DemonBot itself (in other words, we got lucky).

The lesson here: If you put an insecure cluster out there, it will be attacked.

Security for Cloudera Altus

Now that we’ve seen what happens to an insecure cluster, let’s walk through setting up a more secure cluster. Our goals are to

Only allow SSH access to the cluster from a limited set of machines
Enable strong authentication with Kerberos

Fortunately, this is easy to configure in Altus. In Altus, clusters are created inside an environment. An environment describes how to access the user’s cloud provider account and the resources it contains. It also specifies some of the basics of how clusters are created. As a result, the configuration options that we’re interested in occur in Altus environments. There are two ways to create an environment: though a simple Quickstart process or through a setup Wizard that allows for more flexibility.

The Quickstart process is of course the easiest. When creating an environment, choose the “Environment Quickstart” path and select the Enable checkbox for Secure Clusters:

That’s it! When Secure Clusters are enabled, Kerberos is enabled. The Quickstart path will also create a Security Group that remains inaccessible to the outside world—and exploits like DemonBot and XBash. We created a cluster in this environment, and of course no attacks took place on the cluster.

If you need to use the environment creation Wizard, you will also click a Secure Clusters checkbox just like in the Quickstart. The difference is that in the Wizard you must supply a Security Group yourself. When creating this Security Group, ensure that it only allows SSH access from Altus IP addresses.

Conclusion

The internet is a dangerous place for Hadoop clusters. Consequently, we recommend the following:

Don’t expose clusters directly to the internet
Always enable Kerberos authentication
When using Altus, select the “Secure Clusters” checkbox

With these simple steps your Hadoop cluster will be protected from the types of attacks described here.

Michael Yoder

More by this author

Suraj Acharya

More by this author

Editor's Choice

Business

Generative AI for the Enterprise

Technical

Building Trust in Public Sector AI Starts with Trusting Your Data