FAQ: Understanding the Parcel Binary Distribution Format

FAQ: Understanding the Parcel Binary Distribution Format

[Ed. Note, added May 17, 2016: Much of information below is now outdated/deprecated. For current information about the parcel format, see the documentation.]

Have you ever wished you could upgrade to the latest CDH minor release with just a few mouse clicks, and even without taking any downtime on your cluster? Well, with Cloudera Manager 4.5 and its new “Parcel” feature, you can!

That release introduced many new features and capabilities related to parcels, and in this FAQ-oriented post, you will learn about most of them.

What are parcels?

Parcel is an alternative binary distribution format supported for the first time in Cloudera Manager 4.5. There are a few notable differences between parcels and traditional CDH rpm/deb packages:

  • CDH is provided as a single package. In contrast to having a separate package for each part of CDH, when using Cloudera Manager 4.5 and later, there is just a single parcel to install.
  • Parcels can be installed side-by-side. Each parcel is self-contained and installed in a separate versioned directory. This means that multiple versions of a given parcel can be installed at the same time. You can then select one of these installed versions as the “active” one. (With traditional CDH packages, only one package can be installed at a time so there’s no distinction between what’s “installed” and what’s “active”.)
  • Parcels can run from arbitrary locations. Parcels can be installed at any location in the filesystem.
  • Parcels are gzipped tar files with metadata. From a strict implementation point of view, a parcel is simply a tarball containing the program files, along with some additional metadata that allows Cloudera Manager to understand what it is and how to use it.

What are the benefits of parcels?

As a consequence of the functional characteristics noted above, parcels offer a number of benefits:

  • Simplified distribution: As a parcel is a single file, it’s much easier to move around than the dozens of packages that make up CDH. This is especially useful when managing a cluster that isn’t connected to the Internet.
  • Internal consistency: By distributing CDH as a single parcel, we can help ensure that all CDH components are properly matched and that there isn’t a danger of different parts coming from different versions of CDH.
  • Installation outside of /usr: In some IT environments, Hadoop admins do not have privileges to install system packages. In the past, these admins had to fall back to CDH tarballs, which deprived them of a lot of infrastructure that packages provide. With parcels, admins can install to /opt or anywhere else without having to step through all the additional manual steps of regular tarballs.
  • Installation of CDH without sudo: Parcel installation is handled by the CM Agent already running as root so it’s possible to install CDH without needing sudo, which can be very helpful.
  • Decoupling of distribution from activation: Thanks to side-by-side install capabilities delivered by parcels, it is now possible to stage a new version of CDH across the cluster in advance of switching over to it. This allows the longest running part of an upgrade to be done ahead of time without affecting cluster operations, consequently reducing upgrade downtime.
  • Rolling upgrades: With the new version staged side-by-side, switching to a new minor version is simply a matter of changing which version of CDH is used when restarting each process. It then becomes practical to do upgrades with rolling restarts, where service roles are restarted in the right order to switch over to the new version with minimal service interruption. Note that major version upgrades (CDH3 -> CDH4) require full service restarts due to the substantial changes between the versions.
  • Easy downgrades: With the old version still available, moving back to it can be as simple as upgrading. (Note that some CDH components may require explicit additional steps due to things like schema upgrades.)

What new capabilities in Cloudera Manager 4.5 are premised on parcels?

Thanks to the introduction of parcels, a host of new capabilities are now delivered by Cloudera Manager:

  • End-to-end deployment life-cycle management: Starting with 4.5, Cloudera Manager can now fully manage all the steps involved in a CDH version upgrade. (In contrast, with traditional packages, Cloudera Manager can only help with initial installation.)

    Life-cycle of a parcel

    • Download: Parcels are published to Cloudera’s repository. Cloudera Manager will then download the parcel to the CM Server machine.
    • Distribution: Once the Server has the parcel, Cloudera Manager can distribute the parcel out to all the hosts in the cluster. This process can be tuned in terms of how many hosts receive the parcel at the same time and the total aggregate bandwidth used for the process.
    • Activation: Once a parcel is distributed, you can activate it. Once activated, it will be used for any processes that are subsequently started or restarted.
    • Deactivation: Similarly, a parcel can be deactivated (and will automatically be deactivated if another one is activated).
    • Removal: This is the reverse of distribution. A parcel that has been deactivated and is not serving any current processes is eligible for removal from the hosts in the cluster.
    • Deletion: Finally, once removed from the cluster, the parcel can be deleted from the CM server, which completes the life-cycle of a parcel.
    • The following screenshot shows:
      • One active CDH and one active Impala parcel
      • One CDH parcel being downloaded
      • One CDH parcel being distributed
      • One CDH parcel available for download

        The Parcels page in Cloudera Manager

  • End-to-end capabilities are optional: If there are specific reasons to use other tools for download and/or distribution, you can do so, and Cloudera Manager will work alongside your other tools. For example, you can handle distribution with something like Puppet. Or, if you want to download the parcel to CM Server manually (perhaps because your cluster has no Internet connectivity) and then have Cloudera Manager distribute the parcel to the cluster, you can do that too.
  • Rolling upgrades: These are only possible with parcels, thanks to their side-by-side nature. Traditional packages would require shutting down the old process, upgrading the package, and then starting the new process. This can be hard to recover from in the event of errors and requires extensive integration with the package management system to function seamlessly.
  • Distributing additional components: Parcels are not limited to CDH. Impala is available as a parcel too and we’ve just published an LZO parcel that provides the LZO plugins for both Hadoop and Impala. In a future blog post, we’ll discuss how you might build your own parcels to distribute other software.

What parcels are currently available?

  • CDH: 4.1.3 and newer
  • Impala: 1.0 and newer (parcels are available for old betas, but only 1.0 is supported)
  • LZO: Contains plugins for CDH 4.x and Impala 1.0

How do I configure parcels?

All parcel-related configuration settings are collected in the Parcels section of the CM Server properties. Here are some of the key configuration settings:

  • Local Parcel Repository Path: This is the location on the CM server where downloaded parcels will be stored.
  • Parcel Update Frequency: This controls how often the CM server will check for the presence of new parcels in the repository(s).
  • Remote Parcel Repository URLs: This is the list of parcel repositories that Cloudera Manager will check for parcels. By default, it includes Cloudera’s CDH and Impala repositories. If you need LZO support, you would want to add the LZO parcel repository to this list.
  • Proxy Server Settings: There are a group of settings that allow an HTTP proxy to be configured, if one is required to access the Internet from the CM server.
  • Automatic Download/Distribution: These settings allow you to configure the CM server to automate the download and distribution steps so that as soon as a new release is detected, it will be staged and ready for activation without any direct intervention.

    Parcels configuration screen

Conclusion/Next Steps

As you can see, we believe that users get a lot of benefits from this new approach to binary distribution. To read more about parcels and how to use them in Cloudera Manager, see the parcels documentation available here.

In future blog posts we’ll walk through the process of using parcels to upgrade from CDH3 to CDH4, and doing rolling upgrades to move between CDH4 releases with minimal downtime.

Philip Langdale is a Software Engineer on the Enterprise team.

> Ask questions and get answers about parcels in the community forum for Cloudera Manager.

Learn More About Parcels

Want to see the power of Parcels in action? Watch our e-learning module on Understanding Parcels to learn the fundamentals of optimizing your Hadoop operations with Parcels. The video includes a step-by-step demo of upgrading CDH and installing Impala, Search, and Hadoop LZO.

Philip Langdale
More by this author


by Deniel weck on

Wonderful information, I learned a lot of new things from your blog, Which i used into my project bin file thanks for sharing with us.

Leave a comment

Your email address will not be published. Links are not permitted in comments.