Security architecture is complex, but these testing strategies help Cloudera customers rely on production-ready results.
Among other things, good security requires user authentication and that authenticated users and services be granted access to those things (and only those things) that they’re authorized to use. Across Apache Hadoop and Apache Solr (which ships in CDH and powers Cloudera Search), authentication is accomplished using Kerberos and SPNego over HTTP and authorization is accomplished using Apache Sentry (the emerging standard for role-based fine grain access control, currently incubating at the ASF).
The interactions among Kerberos, Sentry, and different system configs, OSs, and environments are complicated, and for production-ready applications, they require a variety of tests to ensure that security works as expected. In this post, you’ll learn how Cloudera uses a range of testing methodologies to help ensure that security operations work as expected for Cloudera Search customers—many of which are in highly regulated industries.
Native Security in Solr
This diagram illustrates the execution cycle of a sample Solr request in a secure environment. Incoming HTTP requests must first complete Kerberos authentication. If authentication fails, the web server returns an HTTP “401 Unauthorized” error to the user, thereby restricting Solr access. If authentication succeeds, Solr forwards the request to Sentry for authorization. Sentry grants or denies access based on which user is requesting access, the request type, and the existing permissions defined for the user.
This feature, called index-level security, provides control over collections using QUERY, UPDATE, and ALL permissions. If authorization fails, Solr returns an HTTP “401 Unauthorized” error. Otherwise, the request is processed by Solr.
In addition to index-level security, Solr supports document-level security via Sentry, which enforces the fine-grain access control of which Solr documents can be viewed by which users.
There are two primary sets of test cases for validating Sentry-Solr integration: Unit tests and integration tests. Integration tests are further divided into mini-cluster integration tests and real hardware integration tests.
Unit Testing Sentry and Solr
Unit tests run in a virtualized cluster spun up on the test JVM and help validate the functionality of a specific method or class. These tests are very important in isolating failures to a specific module before the error touches multiple layers (modules), which makes it hard to debug. Unit tests are important because they provide good code coverage and help catch bugs sooner in the development cycle. Cloudera runs the entire suite of unit tests before deploying every new project build. This helps to ensure that there are no regressions from recent check-ins and helps verify code stability.
Integration Testing for Sentry + Solr
Integration tests evaluate end-to-end functionality of the product, mainly to mimic the end user scenarios. These tests are very important in verifying the overall functionality of the system, ranging from integration of multiple sub-modules or inter-component interaction (Sentry <-> Solr) and to test all these pieces at one place.
There are two places where integration tests are run: One is in the mini-cluster spun up in the JVM, and the other is on real distributed clusters. Both methods have advantages: Issues found using tests on a single JVM can be easier to debug, whereas end-to-end user scenarios run on real distributed clusters may identify issues that would not manifest on a single JVM.
Integration Testing in a Single JVM
Sentry, being a critical piece in security, requires exhaustive test coverage across all possible user scenarios to avoid corner cases that bypass security. This characteristic requires running SolrCloud (multiple shards and replicas) and Sentry in a single environment. Solr has an existing test framework for testing SolrCloud features (see AbstractFullDistribZkTestBase), but using it requires pulling in the entire Solr test hierarchy, which may not be required. The other downside of this approach is that Solr dependencies may conflict with other projects. To avoid this issue and to make testing more pluggable, Cloudera developed the MiniSolrCloudCluster (SOLR-5865) test framework, which separates out the SolrCloud functionality and thus makes it easier for new projects to use and test SolrCloud in a single JVM.
The tests developed on MiniSolrCloudCluster framework cover extensive use cases by auto-generating users with all possible permissions and sending out requests as a particular user to make sure the Solr responses are being matched with the expected output. For example, for three existing access specifiers QUERY, UPDATE, ALL, the test framework generates eight possible users. Each user would have one of the eight possible permissions and each user would issue both QUERY and UPDATE requests to Solr. The tests then compare to verify that Sentry’s granting access or unauthorized error matches the expected response. This framework provides us with a good understanding and baseline of Sentry behavior in a real Solr cluster scenario.
Integration Testing in a Distributed Cluster
One caveat to the above approach is that neither MiniSolrCloudCluster nor the existing Solr test-framework support Kerberos authentication at the time of this writing. To address this issue, Cloudera developed a non-Kerberos authentication layer that modifies the request to look like it successfully passed the Kerberos authentication. Although this approach enables us to test Sentry extensively with Solr, it also bypasses testing Kerberos authentication, a critical element of our security solution. In a production deployment, the end-user must still use Kerberos to log into the system and run commands via Sentry. This critical piece was covered in our integration testing on real distributed clusters.
The other main difference between real clusters and MiniSolrCloudCluster is that with the latter, all the required processes like Sentry, Solr, and Apache ZooKeeper run in a single JVM. Although this approach is good for testing locally, we would still want to have a clear picture of how the system behaves if all the processes have their own JVMs and completely distributed across multiple nodes in the cluster.
The next and final line of defense is to run the integration tests on real clusters of varied sizes, configurations, and OSs. Running the suite of integration tests on real clusters has many clear advantages, like the ability to:
- Catch packaging issues (like errors with RPMs, DEBs and etc) that fall out when building the product
- Mimic the end-user deployment scenario (running on wide range of OSs)
- Cover use cases not covered by previous rounds of testing (running the tests in fully secure Kerberos environment that hasn’t been covered by MiniSolrCloudCluster)
- Scale out the cluster to an arbitrary size and check the performance implications
- Run the system under different configurations, such as NameNode HA or vanilla (simple secure) systems
- Induce failure scenarios and monitor how the system recovers
- Run longer-running tests that ingest lots of data (as memory is a constraint when running MiniSolrCloudCluster in JVM)
Testing on real clusters is done by first creating a large number of Linux users, groups, Kerberos principals, and keytabs for the above users, which is completely missing in MiniSolrCloudCluster. We then define a Sentry policy file with roles to groups and privileges to roles mapping.
We run a subset of the most common end-to-end scenarios, as the MiniSolrCloudCluster already runs the exhaustive suite of tests. These tests are run against the real cluster, which includes “kinit”ing as user and sending out a QUERY/UPDATE request to Solr. In this cycle, the user first authenticates with Solr using Kerberos, followed by Sentry scrutinizing the incoming request based on the user authenticated.
Because MiniSolrCloudCluster runs in a JVM, it has to be created at the beginning of every test cycle, which results in loss of state. However, in real clusters the state is preserved as the clusters are longer running. This approach provides us with an environment for running the same set of tests multiple times and to notice the problems over time (such as memory leaks, longer GC pauses, and average turnaround time of a single request).
You should now understand the different levels of validation done by Cloudera for Search and Sentry integration. We welcome any other suggestions or contributions to the existing Sentry test suite. The source code can be found here.
Cloudera Search and Sentry are available for download as part of CDH and comes with extensive documentation. If you have any questions, please contact us at the Cloudera Search Forum or Search mailing list.
Vamsee Yarlagadda is a Software Engineer at Cloudera and an Apache Sentry (incubating) committer.