How-to: Implement Role-based Security in Impala using Apache Sentry

This quick demo illustrates how easy it is to implement role-based access and control in Impala using Sentry.

Apache Sentry (incubating) is the Apache Hadoop ecosystem tool for role-based access control (RBAC). In this how-to, I will demonstrate how to implement Sentry for RBAC in Impala. I feel this introduction is best motivated by a use case.

Data warehouse optimization is one of the most common Hadoop use cases. After migrating data transformation workloads to Hadoop, customers typically want to provide self-service business intelligence access on Hadoop. Self-service BI results in many distinct users logging in and executing queries each under their own user id. When end users start using the cluster, fine-grained authorization is a requirement to satisfy internal controls and governmental regulations. Sentry was initially created originally for this use case.

I won’t go into detail here about why fined-grained authorization is useful; my colleague Shreepadma Venugopalan covered this topic in her post “With Sentry, Cloudera Fills Hadoop’s Enterprise Security Gap.” Furthermore, Sravya Tirukkovalur wrote a post about using Sentry with Apache Hive (“How-to: Get Started with Sentry in Hive”).

Sentry and Impala work together in a similar fashion as Sentry and Hive. In fact, since the policy file syntax is identical, users who use both Hive and Impala are encouraged to share the same policy file.

The two systems have different architectures resulting in some divergence in how they interact with Sentry. For example, Hive is typically configured with a single or small number of HiveServer2 instances. Impala works differently as each Impala daemon accepts queries, one of the many design features which helps Impala scale to a large number of concurrent queries.

In the Hive case, a small number of HiveServer2 instances will read the policy file from HDFS, whereas in the Impala case, each daemon will. (Since many Impala daemons will be reading the file from HDFS and the file is small, setting the replication count equal to the number of slave nodes is reasonable.) One additional difference is that while Hive reads and parses the policy file for each query, Impala checks to see if the policy file has been updated every five minutes.

If you’d like to learn more about configuring Sentry, watch the video below or go straight to our documentation on Configuring Sentry and Impala Security.

In the video below, we will use a policy file, shown below, which in addition to an admin role has hierarchical roles manager_role, analyst_role, and junior_analyst_role. As you can see below, the manager_role has ALL on the default database, whereas the analyst_role has ALL on the analyst1_table and SELECT on the manager1_table. The junior_analyst_role has ALL on jranalyst1_table.

[groups]
management = manager_role
analyst = analyst_role, junior_analyst_role
jranalyst = junior_analyst_role
admin = admin_role 

[roles]
manager_role = server=server1->db=default
analyst_role = server=server1->db=default->table=analyst1_table->action=select
junior_analyst_role = server=server1->db=default->table=jranalyst1_table->action=select

# Implies everything on server1.
admin_role = server=server1

 

In the demo below, I will first enable Sentry with Impala and then create and share a view of manager1_table for junior analysts that restricts their access to roles as well as columns.

Conclusion

You should now understand the relatively straightforward procedure of implementing RBAC in Impala using Sentry!

Brock Noland is a Software Engineer at Cloudera and an Apache committer on the Crunch and Hive projects.

2 Responses
  • mark / March 20, 2014 / 10:36 AM

    Can’t hear the sound. Screen resolution is quite poor and I can’t follow it. Protecting columns is critical but so is protecting individual rows. Can Sentry help in this regard?

  • mark / March 20, 2014 / 11:31 AM

    What is the likelihood of adding cell-level security similar to Accumulo? My biggest customers need row-level and column-level security at a minimum, but cell-level would please the IT managers (albeit with a nightmare to administer the ACL).

Leave a comment


5 − three =