Few projects within the Apache Hadoop umbrella have as much end-user visibility as Hue, the open source Web UI that makes Hadoop easier to use. Due to the great number of potential end users, it is useful to add a degree of fault tolerance to your deployment. This how-to describes how to achieve higher availability by placing several Hue instances behind a load balancer.
This tutorial demonstrates how to set up high availability by:
- Installing Hue 2.3 on two nodes in a three-node RHEL 5 cluster
- Managing all Hue instances via Cloudera Manager
- Load balancing using HA Proxy 1.4. (In fact, any load balancer with sticky sessions should work.)
Before we begin, we suggest that you view this quick video demonstrating how to achieve HA in Hue:
Hue should be installed on two of the three nodes. To have Cloudera Manager automatically install Hue, follow the “Parcel Install via Cloudera Manager” section. To install manually, follow the “Package Install” section.
Parcel Install via Cloudera Manager
For more information on Parcels, see Managing Parcels.
- From Cloudera Manager, click on Hosts in the menu. Then, go to the Parcels section.
- Find the latest CDH parcel, click Download.
- Once the parcel has finished downloading, click Distribute.
- Once the parcel has finished distributing, click Activate.
- Download the yum repository RPM.
- Install the yum repository with
sudo yum --nogpgcheck localinstall cloudera-cdh-4-0.x86_64.rpm
.For more information, see Installing CDH4.
- Install Hue on each node with
sudo yum install hue. For more information on installing Hue, see CDH documentation.
Managing Hue through Cloudera Manager
Cloudera Manager provides management of the Hue servers on each node. Add two Hue services using the directions below. For more information on managing services, see the Cloudera Manager documentation.
- Go to Services -> All Services in the menu.
- Click Actions -> Add a Service.
- Select “Hue” and follow the steps on the screen. NOTE: For each Hue service we choose a unique host.
- Ensure that the “Jobsub Examples and Templates Directory” configuration points to different directories in HDFS for each Hue service. It can be changed by going to Services -> . In the menu, go to Configuration -> View and Edit. Then, click on Hue Server. “Jobsub Examples and Templates Directory” should be at the bottom of the page.
Cloudera Manager handling two Hue services
HA Proxy Installation/Configuration
- Download and unzip the binary distribution of HA Proxy 1.4 on the node that doesn’t have Hue installed (called serverc.cloudera.com in the example).
- Add the following HA Proxy configurationto /tmp/hahue.conf:
123456789101112131415161718192021globaldaemonnbproc 1maxconn 100000log 127.0.0.1 local6 debugdefaultsoption http-server-closemode httptimeout http-request 5stimeout connect 5stimeout server 10stimeout client 10slisten Hue 0.0.0.0:80log globalmode httpstats enablebalance sourceserver hue1 servera.cloudera.com:8888 cookie ServerA check inter 2000 fall 3server hue2 serverb.cloudera.com:8888 cookie ServerB check inter 2000 fall 3
- Start the HA Proxy with
haproxy -f /tmp/hahue.conf
The key configuration options are balance and server in the listen section. When the balance parameter is set to source, a client is guaranteed to communicate with the same server every time it makes a request. If the server with which the client is communicating goes down, the request will automatically be sent to another active server. This is necessary because Hue stores session information in process memory. The server parameters define which servers will be used for load balancing and takes the form:
server <name> <address> [:port] [settings ...]
In the configuration above, the server hue1 is available at servera.cloudera.com:8888 and hue2 is available at serverb.cloudera.com:8888. Both servers have health checks every two seconds and are declared down after three failed health checks. In this example, HAProxy is configured to bind to 0.0.0.0:80. Thus, Hue should now be available at http://serverc.cloudera.com.
Hue can be load-balanced easily as long as the server a client is directed to is constant (that is, there are “sticky” sessions). Load balancing can improve performance, but its primary goal is HA. (Note that for true high availability, Hue needs to be configured to use HA via MySQL, PostgreSQL, or Oracle Database.) Also, multiple Hue instances can be easily managed through Cloudera Manager.