There are a number of special “users” with roles to play in the Apache Hadoop environment. For your reference, we have summarized them below as of CDH 4.4. Kerberos principals (used for authentication in a secure cluster) are not covered here.
The specific user IDs listed are the ones created by default on installation but they are configurable unless otherwise indicated.
|HDFS||hdfs||hdfs||The NameNode and DataNodes run as this user, and the HDFS root directory as well as the directories used for edit logs should be owned by it.
Superusers are defined by a group named in hdfs-site.xml, dfs.permissions.superusergroup, which is the UNIX group containing users that will be treated as superusers by HDFS. The default is supergroup if installing with Cloudera Manager (can be changed in the Cloudera Manager UI), hadoop otherwise. The hdfs, yarn, and mapred users belong to the hadoop group. To give users root privileges in HDFS, create a UNIX group with the same name as this group (or change the value of the configuration to correspond to an existing UNIX group) and add them to the group.
The impala user also belongs to the hdfs group.
|httpfs||httpfs||httpfs||The httpfs service runs as this user.|
|MapReduce||mapred||mapred||Without Kerberos, the JobTracker and tasks run as this user. The LinuxTaskController binary is owned by this user for Kerberos. It would be complicated to use a different user ID.|
|YARN||yarn||mapred, yarn||Without Kerberos, all YARN services and applications run as this user. The LinuxContainerExecutor binary is owned by this user for Kerberos. It would be complicated to use a different user ID.|
|HBase||hbase||hbase||The Master and the RegionServer processes run as this user.|
|Hive||hive||hive||The HiveServer2 process and the Hive Metastore processes run as this user.
A user must be defined for Hive access to its Metastore DB (e.g. MySQL or Postgres) but it can be any identifier and does not correspond to a Unix uid. This is javax.jdo.option.ConnectionUserName in hive-site.xml.
|HCatalog||hive||hive||The WebHCat service (for REST access to Hive functionality) runs as the hive user. It is not configurable.|
|Sentry||No special users|
|Pig||No special users|
|Oozie||oozie||oozie||The Oozie service runs as this user.|
|Flume||flume||flume||The sink that writes to HDFS as this user must have write privileges.|
|Sqoop1||sqoop||sqoop||This user is only for the Sqoop1 Metastore, a configuration option that is not recommended.|
|Sqoop2||sqoop2||sqoop||The Sqoop2 service runs as this user.|
|Hue||hue||hue||Hue runs as this user. It is not configurable.|
|ZooKeeper||zookeeper||zookeeper||The zookeeper process runs as this user. It is not configurable.|
|Search||solr||solr||The solr process runs as this user. It is not configurable.|
|Impala||impala||impala||The impala user also belongs to the hive and hdfs groups.|
|Whirr||No special users|
|Mahout||No special users|
|Cloudera Manager||cloudera-scm||cloudera-scm||Cloudera Manager processes such as the CM Server and the monitoring daemons run as this user. It is not configurable.|