How-to: Run Apache Mesos on CDH

Categories: CDH Cloudera Manager Guest Ops and DevOps

Big Industries, Cloudera systems integration and reseller partner for Belgium and Luxembourg, has developed an integration of Apache Mesos and CDH that can be deployed and managed through Cloudera Manager. In this post, Big Industries’ Rob Gibbon explains the benefits of deploying Mesos on your cluster and walks you through the process of setting it up.

[Editor’s Note: Mesos integration is not currently supported by Cloudera, thus the setup described below is not recommended for production use.]

Apache Mesos is a distributed, generic grid workload manager. Similar to YARN in Apache Hadoop, Mesos is designed to run the generic tasks and services that your Hadoop cluster might not otherwise be able to manage–it’s ideal for running stuff like Memcached, MySQL, Apache httpd, Nginx, HAProxy, Snort, ActiveMQ, or whatever it is that you need to run, for as long as you need to run it. Mesos is designed to scale, like YARN is, and Mesos services can be deployed on clusters of up to 10,000s of nodes.

Why would you want to run things like web servers, proxies, and caches on a Hadoop cluster, though? Well, when assembling a technical solution, especially an off-the-shelf solution, it is common that the buyer expects the vendor to provide a complete, ready to go platform, with a single bill-of-materials. Solutions often make use of operational, front-end serving components (reverse proxies, load balancers, web servers, application servers) and middle-tier components (object caches, JMS, workflow engine etc.) in addition to backend components and while Hadoop is great at solving backend data processing challenges, until Mesos it has been pretty difficult to deploy and operate front-end and middle tier components in a consistent manner as part of a complete, Hadoop-powered solution.

For example, building a security information and event management (SIEM) solution on top of Hadoop means including a live-traffic inspection layer as well as an active archive of security-related event logs and reporting tools. While Hadoop perfectly fits the needs for the active archiving element, without Mesos integration to run a live-traffic inspection system and a reporting server, it would be quite difficult to deliver on the complete system requirements in a consistent way from a single platform.

Mesos comes with a framework called Marathon to launch tasks on the cluster, and a scheduler framework called Chronos, which offers a highly available, fault-tolerant alternative to Unix cron.

Docker

In order to launch an application, Mesos Marathon uses Docker, an application virtualization system that enables portable, standardized and containerized deployment of applications and components across the cluster. The engineer writes a dockerfile, which is a text file containing a set of automation instructions for deploying and configuring the application.

There are other ways to launch applications on Mesos, but Docker offers a robust solution with extensive features.

Putting It Together: Mesos on Cloudera

In order to get Apache Mesos running on a Cloudera environment, we put together Custom Service Descriptors (CSDs) and custom deployment parcels for CentOS 6.5, RHEL 6.5, and Ubuntu 14.04 LTS that can be installed into Cloudera Manager. With this approach, deploying Mesos and Docker is a similar experience to deploying other Hadoop components like YARN, Impala, or Hive. (Note that we made two parcels, one for Mesos and one for Docker, because Docker needs to be run as root whilst Mesos can run under a dedicated user account.)

First, add the Big Industries parcel repo to Cloudera Manager.

To get Cloudera Manager to work with the Mesos and Docker parcels, you need to copy the CSD files to the Cloudera Manager CSD repository:

Once done you’ll need to restart Cloudera Manager to pick up the changes:

The next step is to download, distribute, and activate the Mesos and Docker parcels via Cloudera Manager.

We can now set up and configure a new Mesos service on our cluster from Cloudera Manager, in the same way we would set up any other Hadoop service.

You can choose which nodes of the cluster to use as Mesos slaves, Mesos masters, and where to deploy the Marathon service. You should deploy and run Docker on each node that will run as a Mesos slave. In order to ensure solid resource isolation, you can use Cloudera Manager’s Linux Control Groups integration to allocate appropriate system resource shares to the Mesos framework; this way Mesos and other Hadoop components like YARN and Impala can coexist.

Set up the hosts-to-roles mappings in Cloudera Manager:

Running Docker Images in Marathon

Marathon has a REST API. Docker images can be started with a POST to:

containing a configuration file in JSON format.

Editing the JSON files

Various settings can be configured in the JSON file. They are used to configure Marathon and the service the Docker image contains. For example:

Further documentation can be found here. To find the exposed ports, volumes and enviroment variables, check the Dockerfile for the following commands:

Setting Up a Docker Registry Containing the Images

Docker images must be made available to the Docker daemons on the cluster.

The best way to do this is to provide a Docker Registry, which is comparable to a Git-repository for Docker images. A JSON file to setup a registry using Marathon is included in the project. To move docker images from one host to another use the following commands.

To put these images on the docker registry first tag them, then push them. IP address and port of the registry can be found in the Marathon UI.

Note that when using an insecure private registry, like the one from the JSON file, it is important to add the –insecure-registry argument to the start command.

Example: Launching Memcached on Mesos, on CDH

To launch Memcached on a cluster, we need a Docker image for Memcached – we’ll use sameersbn/memcached:latest from the public registry.

You need to create a marathon configuration file like the one below:

Then you need to launch it on the cluster via the Marathon REST API:

Troubleshooting

If the Marathon app stays in the deploying state:

  • Make sure there are enough resources on the slaves.
  • Make sure the IP address and the port number of the registry are set correctly and the registry is added as insecure registry on the Docker daemon.
  • Make sure the image name has a version if required.

Conclusion

In this article we have explained some of the features and benefits of Apache Mesos, seen how to deploy Mesos and Docker under CDH using Cloudera Manager and custom parcels, and had a look at launching an application component (Memcached) across the cluster using Mesos Marathon.

The source code for the Cloudera Manager Mesos and Docker extensions is available on github and its Apache v2 licensed.

Rob Gibbon is architect, manager, and partner at Big Industries, the industry-leading Hadoop SI partner for Belgium and Luxembourg.

Facebooktwittergoogle_pluslinkedinmailFacebooktwittergoogle_pluslinkedinmail

7 responses on “How-to: Run Apache Mesos on CDH

  1. Mohammad Hassany

    As a question:
    I need to configure the YARN, so the YARN jobs to be executed on Mesos.
    Does this installation configure the YARN by default or I have to do it manually.
    thanks

  2. Ravi Hemnani

    Tried adding Mesos and Docker via parcels but after I added the remote URL, the jars at the right place and while trying to download, activate the parcels, I am seeing “Error for parcel MESOS-1.1-wheezy : Parcel not available for OS Distribution DEBIAN_WHEEZY.”. Same for Docker.

    Is there any work around this or am I doing anything wrong here.
    CDH version – 5.5.1-1.cdh5.5.1.p0.11