Guide to Special Users in the Hadoop Environment

There are a number of special “users” with roles to play in the Apache Hadoop environment. For your reference, we have summarized them below as of CDH 4.4. Kerberos principals (used for authentication in a secure cluster) are not covered here.

The specific user IDs listed are the ones created by default on installation but they are configurable unless otherwise indicated.

Project User Group Notes
HDFS hdfs hdfs The NameNode and DataNodes run as this user, and the HDFS root directory as well as the directories used for edit logs should be owned by it.

Superusers are defined by a group named in hdfs-site.xml, dfs.permissions.superusergroup, which is the UNIX group containing users that will be treated as superusers by HDFS. The default is supergroup if installing with Cloudera Manager (can be changed in the Cloudera Manager UI), hadoop otherwise. The hdfs, yarn, and mapred users belong to the hadoop group. To give users root privileges in HDFS, create a UNIX group with the same name as this group (or change the value of the configuration to correspond to an existing UNIX group) and add them to the group.

The impala user also belongs to the hdfs group.

httpfs httpfs httpfs The httpfs service runs as this user.
MapReduce mapred mapred Without Kerberos, the JobTracker and tasks run as this user. The LinuxTaskController binary is owned by this user for Kerberos. It would be complicated to use a different user ID.
YARN yarn mapred, yarn Without Kerberos, all YARN services and applications run as this user. The LinuxContainerExecutor binary is owned by this user for Kerberos. It would be complicated to use a different user ID.
HBase hbase hbase The Master and the RegionServer processes run as this user.
Hive hive hive The HiveServer2 process and the Hive Metastore processes run as this user.

A user must be defined for Hive access to its Metastore DB (e.g. MySQL or Postgres) but it can be any identifier and does not correspond to a Unix uid. This is javax.jdo.option.ConnectionUserName in hive-site.xml.

HCatalog hive hive The WebHCat service (for REST access to Hive functionality) runs as the hive user. It is not configurable.
Sentry No special users
Pig No special users
Oozie oozie oozie The Oozie service runs as this user.
Flume flume                        flume The sink that writes to HDFS as this user must have write privileges.
Sqoop1 sqoop sqoop This user is only for the Sqoop1 Metastore, a configuration option that is not recommended.
Sqoop2 sqoop2 sqoop The Sqoop2 service runs as this user.
Hue hue hue Hue runs as this user. It is not configurable.
ZooKeeper zookeeper zookeeper The zookeeper process runs as this user. It is not configurable.
Search solr solr The solr process runs as this user. It is not configurable.
Impala impala impala The impala user also belongs to the hive and hdfs groups.
Whirr   No special users
Mahout   No special users
Cloudera Manager cloudera-scm cloudera-scm Cloudera Manager processes such as the CM Server and the monitoring daemons run as this user. It is not configurable.
 
Rob Weltman is Director of Engineering at Cloudera.

Filed under:

No Responses

Leave a comment


4 − = three