Guidelines for Installing CDH Packages on Unsupported Operating Systems

Categories: CDH Cloudera Manager

Installing CDH on newer unsupported operating systems (such as Ubuntu 13.04 and later) can lead to conflicts. These guidelines will help you avoid them.

Some of the more recently released operating systems that bundle portions of the Apache Hadoop stack in their respective distro repositories can conflict with software from Cloudera repositories. Consequently, when you set up CDH for installation on such an OS, you may end up picking up packages with the same name from the OS’s distribution instead of Cloudera’s distribution. Package installation may succeed, but using the installed packages may lead to unforeseen errors. 

If you are manually installing (via apt-get, yum, or Puppet) packages for Ubuntu 14.04 with CDH 5.2.0 or later (which is supported by Cloudera), this issue does not pertain to you—Cloudera Manager takes care of the necessary steps to avoid conflicts. Furthermore, if you are using CDH parcels instead of packages for any release or OS, conflicts are similarly not an issue.

If, however, you are either:

  • Manually installing CDH packages (any release) on a newer unsupported OS (such as Ubuntu 13.04, Ubuntu 13.10, Debian 7.5, Fedora 19, and Fedora 20–refer to the CDH 5 Requirements and Supported Versions guide for an up-to-date list of supported OSs), or
  • Manually installing CDH packages for a release earlier than CDH 5.2.0 on Ubuntu 14.04

then you should find the following good-faith installation guidelines helpful.

(Note: If you are installing CDH 5.2 packages manually on a supported OS like Ubuntu 14.04, the documentation lists the necessary steps you need to take. However, you may still find this blog post useful as background reading.)

The Problem in Action

As explained above, if you are mixing-and-matching packages between distributions, you may easily end up with misleading errors.

For example, here is an error when running hbase shell on Ubuntu 13.04 with CDH 5. In this case, the zookeeper package is installed from the OS repositories (in this case, Ubuntu 13.04) whereas the hbase package is installed from CDH.

Does My OS Have This Problem?

In the table below, you will find the mapping of various operating systems and the conflicting packages.

Red means a conflict exists: Installing packages today on the OS would install some package(s) from the OS repo instead of the CDH repo, or worse, have a mix of packages from OS and CDH repo. The value of the field represents the package that will be installed from the OS. For example, OS zookeeper refers to the fact that the zookeeper package would be installed from the OS instead of the CDH repository, which will cause issues.

Orange means no conflict currently exists but that one could arise if the OS repo decides to bump up or change the package version.

If you are using a problematic OS, you will find the solution in the next section.

(Note: Even though Ubuntu 14.04 is listed as a “problematic” OS in the above table, the solution described below is already implemented in Cloudera Manager and described in the documentation. You don’t have to do anything extra if you are using Cloudera Manager or simply following the documentation.)

Solution

The best way to fix this problem is to ensure that all the packages are coming from the same CDH repository. The OS repository is added by default and it’s usually not a good idea to disable that repository. You can, however, set the priority of CDH repo to be higher than the default OS repo. Consequently, if there is a package with the same name in the CDH and the default OS repo, the package from the CDH repository would take precedence over the one in the OS repository regardless of which one has the higher version. This concept is generally referred to as pinning.

For Debian-based OSs (e.g. Ubuntu 13.04, Ubuntu 13.10, Debian 7.5)

Create a file at /etc/apt/preferences.d/cloudera.pref with the following contents:

No apt-get update is required after creating this file.

For those curious about this solution, the default priority of packages is 500. By creating the file above, you provide a higher priority of 501 to any package that has origin specified as “Cloudera” (o=Cloudera) and is coming from Cloudera’s repo (l=Cloudera), which does the trick.

For RPM-based OSs (such as Fedora 19 and Fedora 20)

Install the yum-plugin-priorities package by running:

This package enables us to use yum priorities which you will see in the next step.

Then, edit the relevant cloudera-cdh*.repo file under /etc/yum.repos.d/ and add this line at the bottom of that file:

The default priority for all repositories (including the OS repository) is 99. Lower priority takes more precedence in RHEL/CentOS. By setting the priority to 98, we give the Cloudera repository higher precedence than the OS repository.

For OSs Not on the List

In general, you will have a problem if the OS repository and the CDH repository provide a package with the same name. The most common conflicting packages are zookeeper and hadoop-client, so as a start you need to ascertain whether there is more than one repository delivering those packages.

On a Debian-based system, you can run something like apt-cache policy zookeeper. That command will list all the repositories where the package zookeeper is available. For example, here is the result of running apt-cache policy zookeeper on Ubuntu 13.04:

As you can see, the package zookeeper is available from two repositories: Ubuntu’s Raring Universe Repository and the CDH repository. So, you have a problem.

On a yum-based system, you can run something like yum whatprovides hadoop-client. That command will list all the repositories where the hadoop-client package is available. For example, here is the result from Fedora 20:

As you can see, the package zookeeper is available from multiple repositories: Fedora 20 repositories and the CDH repository. Again, that’s a problem.

Conclusion

Managing package repositories that deliver conflicting packages can be tricky. You have to take the above steps on affected operating systems to avoid any conflicts.

To re-iterate, this issue is mostly contained to the manual use of packages on unsupported OSs:

  • If you are using parcels, you don’t have to worry about such problems. On top of that you get easy rolling upgrades.
  • If you are installing packages via Cloudera Manager, you don’t have to worry about such problems since Cloudera Manager takes care of pinning.
  • If the preceding points don’t apply to you, follow the instructions in the blog post to ensure there are no conflicts among CDH and OS packages

Mark Grover is a Software Engineer on Cloudera Engineering’s Packaging and Integration team, an Apache Bigtop committer, and a co-author of the O’Reilly Media book, Hadoop Application Architectures.

Facebooktwittergoogle_pluslinkedinmailFacebooktwittergoogle_pluslinkedinmail