Stabilizing YARN APIs for Apache Hadoop 2 Beta and beyond

Stabilizing YARN APIs for Apache Hadoop 2 Beta and beyond

This post is authored by Jian He with Vinod Kumar Vavilapalli and is the seventh post in the multi-part blog series on Apache Hadoop YARN – a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters. Other posts in this series:

Introduction

Apache Hadoop 2 is in beta now . Hadoop 2 beta brings in YARN that evolves the compute platform of Hadoop beyond MapReduce so that it can accommodate new processing models, such as streaming, graph processing etc. Till now, MapReduce has always been the only framework and hence was the only user facing API to program against for data processing using Hadoop. Starting Hadoop 2, we have YARN APIs themselves which are lower level compared to MR, but let developers write powerful frameworks that can run alongside MapReduce applications on the same cluster.

We recently talked about how existing MR, Pig, Hive, Oozie apps can work on top of YARN. In line with that and so many other similar great things happening for helping users move to Hadoop 2, the Apache Hadoop YARN community recently worked hard to stabilize the YARN APIs. We addressed each and every API issue that we wished to finish before they can be confidently deemed stable.

We engaged various users of our alpha releases, discussed their pain points. We also took feedback from various users and application developers during the Hadoop YARN Meetups (Meetup I and Meetup II). Completion of this stabilization effort now enables us to support stable and apt APIs for a long time and avoiding the potential pain of supporting bad APIs going forward into the beta and stable releases. YARN-386 is the umbrella JIRA issue that tracked this herculean effort.

YARN API changes: Guide for Hadoop 2 alpha users to port apps to Hadoop 2 beta and beyond

We appreciate the efforts of the early adopters (0.23.x and Hadoop-2 alpha users) trying our software and helping to iron out various kinks! In order to smoothen the upgrade process of the users of our alpha releases, we are writing this document to provide information about various YARN API incompatible changes we introduced when moving from Hadoop 2.0.*-alpha releases to Hadoop 2.1-beta release. We’ve categorized the changes into three types: (1) Simple renames/replacements (2) Unstable APIs that are completely removed or moved and (3) Miscellaneous changes of note. Detailed changes in each category follow:

1. Below is a list of API methods or classes that have been renamed or replaced.

API/Class/Method Change and Description
FROM: YarnException Renamed. This is a private exception used in YARN and MapReduce. Users aren’t supposed to use this.
TO: YarnRuntimeException
FROM: YarnRemoteException YarnException indicates exceptions from yarn servers. On the other hand, IOExceptions indicates exceptions from RPC layer either on the server side or the client side.
TO: YarnException
FROM: AllocateResponse#(get,set)reboot Methods Renamed and changed boolean to an enum. AMCommand is a enum and includes AM_RESYNC and AM_SHUTDOWN. It is sent by ResourceManager to ApplicationMaster as part of the AllocateResponse record.
TO: AllocateResponse#(get,set)AMCommand
FROM: ContainerLaunchContext#(get,set)ContainerTokens Methods Renamed. The tokens may include file system tokens, ApplicationMaster related tokens, or framework level tokens needed by this container to communicate to various services in a secure manner.
TO: ContainerLaunchContext#(get,set)Tokens
FROM: ResourceRequest#(get,set)HostName Methods Renamed. The resource name on which the allocation is desired. It can be host, rack, or *
TO: ResourceRequest#(get,set)ResourceName
FROM: FinishApplicationMasterRequest#setFinishApplicationStatus Renamed. The final application-status reported by the ApplicationMaster to the ResourceManager.
TO: FinishApplicationMasterRequest#setFinalApplicationStatus
FROM: AMRMClient(Async)#getClusterAvailableResources Renamed. ApplicationMasters can use this method to obtain the available resources in the cluster.
TO: AMRMClient(Async)#getAvailableResources
FROM: YarnClient#getNewApplication Renamed and the return type also has been changed. This provides a directly usable ApplicationSubmissionContext that clients can then use to submit an application.
TO: YarnClient#createApplication
FROM: ApplicationTokenSelector and ApplicationTokenIdentifier Renamed. The token-selector and the identifier used by ApplicationMaster to authenticate with ResourceManager
TO: AMRMTokenSelector and AMRMTokenIdentifier
FROM: ClientTokenSelector and ClientTokenIdentifier Renamed. The token-selector and the identifier to be used by clients to authenticate with ApplicationMaster
TO: ClientToAMTokenSelector and ClientToAMTokenIdentifier
FROM: RMTokenSelector Renamed. The selector for RMDelegationToken
TO: RMDelegationTokenSelector
FROM: RMAdmin Renamed and changed package. The command line interface to execute Map-Reduce administrative commands. This is a private class that isn’t intended to be used by the users directly.
TO: RMAdminCLI
FROM: ClientRMProtocol Renamed. The protocol between client and the ResourceManager to submit/abort jobs and to get information on applications, nodes, queues, and ACLs.
TO: ApplicationClientProtocol
FROM: AMRMProtocol Renamed. The protocol between a live ApplicationMaster and ResourceManager to register/unregister ApplicationMaster and request resources in the cluster from ResourceManager
TO: ApplicationMasterProtocol
FROM: ContainerManager Renamed. The protocol between ApplicationMaster and a NodeManager to start/stop containers and to get status of running containers.
TO: ContainerManagementProtocol
FROM: RMAdminProtocol Renamed and changed package from org.apache.hadoop.yarn.api to org.apache.hadoop.yarn.server.api
TO: ResourceManagerAdministrationProtocol
FROM: yarn.app.mapreduce.container.log.dir Configuration property moved from Mapreduce into YARN and renamed. Represents the log directory for the containers if the AM uses the generic container-log4j.properties.
TO: yarn.app.container.log.dir
FROM: yarn.app.mapreduce.container.log.filesize Configuration property moved from Mapreduce into YARN and renamed.
TO: yarn.app.container.log.filesize
FROM: ApplicationClientProtocol#getAllApplications Renamed and changed to accept a list of ApplicationTypes as a parameter with which to filter the applications
TO: ApplicationClientProtocol#getApplications
FROM: ApplicationClientProtocol#getClusterNodes Changed to accept a list of node states with which to filter the cluster nodes and to be consistent with web-services related to nodes.
FROM: ContainerManagementProtcol All APIs are changed to take in requests for multiple containers

2. API methods or classes that are either removed or moved out.

API/Record Moved or removed?
BuilderUtils Moved and made it YARN private. User should instead use record specific static factory method to construct new records.
AMResponse AMResponse is merged into AllocateResponse. Use AllocateResponse to retrieve all responses sent by ResourceManager to ApplicationMaster during resource negotiation.
ClientToken, DelegationToken, ContainerToken Removed. Instead, use the org.apache.hadoop.yarn.api.records.Token as the common type for ClientToAMToken, DelegationToken and ContainerToken.
Container#(get,set)ContainerState (get,set)ContainerStatus Removed from Container as they were always unusable inside Container.
ContainerExitStatus Removed from YarnConfiguration and become as a separate API record.
ContainerLaunchContext#(get,set)User Removed. User-name is already available in ContainerTokenIdentifier in ContainerToken which is passed as part of StartContainerRequest.
(get,set)MinimumResourceCapability Removed from RegisterApplicationMasterResponse and GetNewApplicationResponse. These two methods are supposed to be internal to the scheduler.
APPLICATION_CLIENT_SECRET_ENV_NAME Removed from ApplicationConstants. It is now sent in RegisterApplicationMasterResponse by ResourceManager to a new ApplicationMaster on registration, instead of sharing it by setting into the environment of the Containers.
APPLICATION_MASTER_TOKEN_ENV_NAME Removed from ApplicationConstants. AMRMTokens are now available in the token-file of the AM and can be access from the current UserGroupInformation.
RegisterApplicationMasterRequest#(get,set)ApplicationAttemptId Removed. It’s not needed any more as AMRMToken is now changed to be used irrespective of secure or non-secure environment

3. More changes of note

  • The YARN protocols are sometimes too low level to program against and so we added a bunch of client libraries in yarn-client module. These libraries can help enhance developer productivity by taking care of some of the boiler plate code. Specifically, we added via YARN-418
    • YarnClient : For all communications from the client to the ResourceManager.
    • AMRMClient : For ApplicationMasters to communicate to ResourceManager for requesting resources, registering etc. There are two types of clients here – >AMRMClient for blocking calls and AMRMClientAsync for non-blocking calls. We strongly encourage users to take advantage of the async APIs
    • NMClient : For ApplicationMasters to talk to NodeManager for launching, monitoring and stopping containers on the NodeManagers. Even here, there are two types of clients – NMClient for blocking calls to a single NM and NMClientAsync for non-blocking calls to any NM in the cluster. NMClientAsync is the suggested library as it also packs other useful features like thread management for connections to any or all the nodes in the cluster.
  • ContainerManagementProtocol#startContainers is changed to accept ContainerToken for each container as a parameter so that ContainerLaunchContext is completely user land. ContainerLaunchContext only needs information that has to be set by client/ApplicationMaster, everything else like ContainerToken is taken care of transparently via the Container record.
  • Concept of a separate NMToken and ContainerToken : NMTokens are now used for authenticating all the communication with NodeManager. It is issued by ResourceManager when ApplicationMaster negotiates resource with ResourceManager and is validated on the NodeManager side.
    • ContainerToken is now only used for authorization and only during starting of a container to make sure that the user(identified by application-submitter) is valid, or token is not expired.
  • NMTokens are shared between AMRMClient and NMClient using NMTokenCache (api-based) instead of a memory-based approach. NMTokenCache is a static token cache which will be created one per AM. AMRMClient puts newly received NMTokens in it and NMClient can pick up NMTokens from there to create authenticated connection with NodeManager.
  • Every container launched by NodeManager now contains some key information in its environment such as containerId, container log directory, NM hostname, NM port. See ApplicationConstants.java for more information.
  • All protocol APIs (ApplicationClientProtocol etc.) are changed to throw two exceptions: (1) YarnException which indicates exceptions from yarn servers and (2) IOException which indicates exceptions from RPC layer.
  • All IDs (ApplicationId, ContainerId etc.) are made immutable. User can not modify the IDs after the IDs are constructed.
  • AuxiliaryService which allows per-node custom services has become part of yarn user-facing API. This is a generic service that is started by NodeManager for extending its functionality that administrators have to configure on each node by setting YarnConfiguration#NM_AUX_SERVICES
  • AMRMToken is used irrespective of secure or non-secure environment.

Acknowledgements

These API changes done by the community enable us to confidently support direct users of YARN APIs, essentially framework developers, for a long long time. We’d like to shout out names of all the contributors who helped us make this gigantic leap towards stability. Zhijie Shen, Omkar Vinit Joshi, Xuan Gong, Sandy Ryza, Hitesh Shah, Siddharth Seth, Bikas Saha, Arun C Murthy, Alejandro Abdelnur, among many others, have contributed to this huge effort. Thanks everyone and happy porting!

Cloudera Community
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.