The integration of Apache Sentry with Apache Solr helps Cloudera Search meet important security requirements.
As you have learned in previous blog posts, Cloudera Search brings the power of Apache Hadoop to a wide variety of business users via the ease and flexibility of full-text querying provided by Apache Solr. We have also done significant work to make Cloudera Search easy to add to an existing Hadoop cluster:
- It uses the same pool of data and system resources as other workloads, so you avoid the time and expense of transferring data to an external search service.
- It provides a familiar and trusted security framework for organizations with strict security requirements.
- It is well integrated with our existing management platform (Cloudera Manager) in order to ease adoption and simplify operations.
In this post, we’ll focus on the security features of Cloudera Search. In particular, you’ll learn how Cloudera Search solves authentication, or verifying a user’s identity; and authorization, or controlling access to resources. We’ll also discuss secure impersonation and how it is used with the Hue Search App.
Cloudera Search, via Solr and Apache Lucene, provides an HTTP interface for querying, updating, and managing full-text search indices. Like the other HTTP-level services in an enterprise data hub (such as HttpFS and Apache Oozie), Cloudera Search uses the following frameworks for authentication over HTTP:
- Kerberos: a mutual authentication protocol that works on the basis of “tickets”
- SPNego: a negotiation mechanism for selecting an underlying authentication protocol
Cloudera Search uses SPNego HTTP authentication to select Kerberos as the underlying authentication protocol. Using Kerberos and SPNego in this manner is advantageous for users because many tools for accessing HTTP resources have built-in support for the protocol. For example, you can use
curl with the
--negotiate option, and many popular browsers, including Firefox and Chrome, can be configured to access Kerberos/SPNego protected resources.
Furthermore, although Kerberos is an authentication, not authorization, protocol, you can use it to provide cluster-level access control by granting Kerberos credentials to only those users who should have access to the cluster. If finer-grained control is required than the cluster level, see the section on authorization below.
For information on configuring Cloudera Search to use authentication, see the documentation.
Solr itself does not provide access control support, but rather provides “hooks” to allow other systems to build access control on top of it. We have used these hooks to develop index-level access control using Apache Sentry (incubating). Sentry supports role-based granting of privileges in Solr; each role can be granted query, update, and/or admin privileges on any Solr index (called a “collection” in Solr terminology).
Let’s look at a specification of these privileges, called a policy file (typically stored in HDFS):
# Assigns each Hadoop group to its set of roles
dev_ops = engineer_role, ops_role
engineer_role = collection = source_code->action=Query,\
collection = source_code- > action=Update
ops_role = collection = hbase_logs->action=Query
The policy file comprises two main sections:
[groups]: maps a Hadoop group to its set of Sentry roles
[roles]: maps a Sentry role to its set of privileges. One privilege in Solr is the ability to query, update, or perform administrative actions on a given collection. So, for example, the privilege specification
collection = hbase_logs->action=Querygrants the role the ability to query the hbase_logs collection in Solr.
Now that we’ve seen how to specify policies in Sentry, let’s look at how you would integrate Sentry and Solr. To understand this, let’s first look at how Solr processes an incoming request:
Processing of incoming Solr HTTP request
First, the HTTP request comes into Solr and is sent to the
SolrDispatchFilter is responsible for sending the request to correct
RequestHandler for the collection. If the request is to query data from the collection, it will be sent to the
Select RequestHandler; if the request is to update the collection, it will be sent to the
Update RequestHandler. The request handlers themselves are specified in the collection-specific configuration file called
For example, specifying the
Select RequestHandler may look like this:
<requestHandler name="/select" class="solr.SearchHandler">
Let’s assume this is the configuration for a Solr collection called “collection1”. This request handler specification tells Solr that a request to the path
http://localhost:8983/solr/collection1/select should be dispatched to an instance of solr.SearchHandler.
In addition to the standard solrconfig.xml, Cloudera Search ships with a modified version (
solrconfig.xml.secure) that has request handlers integrated with Sentry. For example, with the select handler above, Sentry uses a Solr
SearchComponent to check permissions before the query request is processed:
Solr RequestHandler with Sentry Component
The secure versions of the other standard collection request handlers are implemented in a similar fashion.
The section above covered requests on specific Solr collections, but what about cluster-level administrative actions? In Solr, administrative requests are sent to the
/admin path. For example, a request to create a collection looks like:
If you compare this URL to the collection-specific URL above, you’ll see that “admin” just looks like any other collection but with a different set of request handlers. Sentry mirrors this structure for privilege-granting purposes: instead of granting “admin” access to a role, query or update access is granted to the “admin” collection. Query access grants privileges for read-only administrative commands (for example, dump the state of all the threads running in a Solr server), while update grants privileges for write-only administrative commands (such as changing the level of logging output for a Solr server).
For example, to grant a Sentry role read-only administrative command privileges and the ability to update a collection called “collection1”, add this to the sentry policy file:
sample_role = collection = admin -> action = QUERY \
collection = collection1 -> action=Update
Solr ships with a wide variety of collection-specific and administrative-level request handlers. For a complete list of the Sentry privileges required for the built-in Solr request handlers, see the documentation.
Secure Impersonation and Hue
Like Hadoop and Oozie, Cloudera Search has support for secure impersonation: the ability of a “super-user” to submit requests on behalf of another user, conceptually similar to sudo functionality on Unix. For security reasons, this functionality is limited to only the groups and hosts that are explicitly configured. (See the documentation for more information.)
The excellent Hue Search App makes use of this functionality in order to integrate with its own security mechanisms. Without this impersonation support, Hue would need access to Kerberos credentials for every user of the Hue App who wants to access Solr — an unacceptable requirement for many organizations. Instead, Hue can integrate with LDAP (and other authentication systems) in order to make requests on behalf of the LDAP authenticated user by using Secure Impersonation, seamlessly integrating with Solr and Sentry.
We believe the integration of Solr and Sentry in Cloudera Search is an exciting development that opens up new workloads in CDH for organizations with strict security requirements, all in an easily consumed application provided by Hue.
Gregory Chanan is a Software Engineer at Cloudera and an Apache HBase Committer.