How-to: Install Cloudera Manager and Cloudera Search with Ansible

The following guest post is re-published here courtesy of Gerd König, a System Engineer with YMC AG. Thanks, Gerd!

Cloudera Manager is a great tool to orchestrate your CDH-based Apache Hadoop cluster. You can use it from cluster installation, deploying configurations, restarting daemons to monitoring each cluster component. Starting with version 4.6, the manager supports the integration of Cloudera Search, which is currently in Beta state. In this post I’ll show you the required steps to set up a Hadoop cluster via Cloudera Manager and how to integrate Cloudera Search.

Furthermore, I want to introduce a small level of automation using the tool Ansible, to avoid performing the same manual steps again and again – thereby being able to replay them whenever you want, or to just use it for any upcoming cluster installations.

I favour Ansible over Chef and Puppet, because it:

  • requires no agents on the cluster nodes,
  • is not focused on a special language,
  • has a very low entry barrier; first missions will be completed within half an hour.

Starting Point

This tutorial is based on a very restricted Hadoop cluster, consisting of five newly created nodes based on Debian Squeeze. One of them is dedicated to the Cloudera Manager, the remaining four nodes will build the real cluster. The installation will be triggered from a client, e.g. your workstation, which requires Ansible to be installed, obviously. 

Ansible ensures that the prerequisites are fulfilled and the required packages are of the same version among all nodes. The corresponding playbook is available via Github.

Check out the repository containing the Ansible playbooks and change into this directory afterwards:

 

Step-by-Step Instructions

  1. Install base packages, e.g. the oracle jdk and ntp, on all nodes.

     

  2. Install Cloudera Manager.

     

  3. Login to the Cloudera Manager web interface and click through the wizard. Open URL:

     

    in your browser and login with the default credentials admin/admin

  4. Wizard interactions (after each task click Continueto switch to next step).
    • Select free license and click Continue twice.
    • Specify your nodes and click Search (you can type name ranges, e.g. hadoop-pg-[2-5]).
    • Ensure that all of your nodes are marked.
    • Since we are going to integrate Cloudera Search, ensure that SOLR is activated, as well as installation method via Parcels.
    • Provide user and password for connecting to the cluster nodes and installing the software. The provided user needs to have permission to install packages on each node. After clicking Continue the manager agents will be installed on each node and the result is displayed on this page.
    • CDH, Solr, and Impala (if selected) are being installed on the cluster nodes.
    • The host inspector scans the cluster and evaluates the correctness of the node state.

      Check the result of the host inspector thoroughly and eliminate any warnings.

    • Choose the services that will be installed on the cluster. Enable all services (or your desired choice) and click Inspect Role Assignments afterwards.
    • Ensure that each node is linked to the planned service correctly. By default the Cloudera Manager activates the role DataNode even on the master server, disable this role for the master node.

    • Check the database connection to the embedded Postgresql database. Just click Test connection, Cloudera Manager fills in the created credentials correctly.
    • Review the Hadoop configuration. On this page you can verify/modify the properties that are “normally” (in the sense of a cluster installation without Cloudera Manager) included in the core-site.xml, hdfs-site.xml and mapred-site.xml config files.
    • The Cloudera Manager now starts all cluster services.
  5. Integrate Cloudera Search into the Hue web interface.
    • Add service Solr. At the time of writing, the Cloudera Search is currently in Beta state and the required steps to integrate it into the Hue web interface are split into tasks within the Manager web interface and tasks to perform on the command line of the Cloudera Manager node. Detailed instructions can be found here, the shortened version is:
      • Tab Services => All services => Dropdown Actions => Add a service => choose Solr service.
      • Choose the Zookeeper quorum (there is currently just one).
      • Choose the cluster nodes that shall serve the SOLR service.
      • Accept the configuration modification.
      • The SOLR service will be deployed in the cluster.
    • Add service Flume:
      • Perform the same steps as described for adding service SOLR, except choosing Flume instead of Solr as service to be added.
    • Extend service Hue:
      • Click Services => Hue => Configuration => View and Edit => search for the word ‘safety’
      • Enter the text:

         

        into the text box of category Hue Server (Default) / Advanced and replace “” with the server’s name you’ve chosen inSstep 5 to serve the SOLR service.

      • And finally, execute the last ansible playbook from your client’s command line to add the “Solr search” icon to the Hue web interface:

Summary

Cloudera Manager combined with Cloudera Search is a great tool and the entry point to your Hadoop cluster for SystemEngineers/DevOps as well as for Data Analysts. Nevertheless, you should have a good understanding of the underlying concepts before you are going to set up a cluster, especially to verify the cluster configuration and role assignments.

The guide in this post includes several manual clicks inside the Wizard to set up the cluster, but there’s also a REST API for Cloudera Manager available, which can be used to automate even those steps. I’ll write an article about using this API by introducing some more ansible playbooks soon, so stay tuned!

If you have any questions or want to share your experience, just leave a reply or get in contact with me.

1 Response

Leave a comment


2 × seven =