Cloudera Engineering Blog · Security Posts
Learn how HiveServer, Apache Sentry, and Impala help make Hadoop play nicely with BI tools when Kerberos is involved.
In 2010, I wrote a simple pair of blog entries outlining the general considerations behind using Apache Hadoop with BI tools. The Cloudera partner ecosystem has positively exploded since then, and the technology has matured as well. Today, if JDBC is involved, all the pieces needed to expose Hadoop data through familiar BI tools are available:
The integration of Apache Sentry with Apache Solr helps Cloudera Search meet important security requirements.
As you have learned in previous blog posts, Cloudera Search brings the power of Apache Hadoop to a wide variety of business users via the ease and flexibility of full-text querying provided by Apache Solr. We have also done significant work to make Cloudera Search easy to add to an existing Hadoop cluster:
This quick demo illustrates how easy it is to implement role-based access and control in Impala using Sentry.
Apache Sentry (incubating) is the Apache Hadoop ecosystem tool for role-based access control (RBAC). In this how-to, I will demonstrate how to implement Sentry for RBAC in Impala. I feel this introduction is best motivated by a use case.
Integrating Hue with LDAP can help make your secure Hadoop apps as widely consumed as possible.
Hue, the open source Web UI that makes Apache Hadoop easier to use, easily integrates with your corporation’s existing identity management systems and provides authentication mechanisms for SSO providers. So, by changing a few configuration parameters, your employees can start analyzing Big Data in their own browsers under an existing security policy.
A quick on-ramp (and demo) for using the new Sentry module for RBAC in conjunction with Hive
One attribute of the Enterprise Data Hub is fine-grained access to data by users and apps. This post about supporting infrastructure for that goal was originally published at blogs.apache.org. We republish it here for your convenience.
There’s good news for users of Hue, the open source web UI that makes Apache Hadoop easier to use: A new SAML 2.0-compliant backend, which is scheduled to ship in the next release of the Cloudera platform, will provide a better authentication experience for users as well as IT.
With this new feature, single sign-on (SSO) authentication can be achieved instead of using Hue credentials – thus, user credentials can be managed centrally (a big benefit for IT), and users needn’t log in to Hue if they have already logged in to another Web application sharing the SSO (a big benefit for users).
Every day, more data, users, and applications are accessing ever-larger Apache Hadoop clusters. Although this is good news for data driven organizations overall, for security administrators and compliance officers, there are still lingering questions about how to enable end-users under existing Hadoop infrastructure without compromising security or compliance requirements.
While Hadoop has strong security at the filesystem level, it lacks the granular support needed to adequately secure access to data by users and BI applications. Today, this problem forces organizations in industries for which security is paramount (such as financial services, healthcare, and government) to make a choice: either leave data unprotected or lock out users entirely. Most of the time, the preferred choice is the latter, severely inhibiting access to data in Hadoop.
Apache Hive was one of the first projects to bring higher-level languages to Apache Hadoop. Specifically, Hive enables the legions of trained SQL users to use industry-standard SQL to process their Hadoop data.
However, as you probably have gathered from all the recent community activity in the SQL-over-Hadoop area, Hive has a few limitations for users in the enterprise space. Until recently, two in particular – concurrency and security – were largely unaddressed.
Thanks to Steven Noels, SVP of Products for NGDATA, for the guest post below.
NGDATA builds and sells Lily, the next-generation Customer Intelligence Platform that helps enterprise marketing teams collect and store customer interaction data in order to profile, segment, and present better offers. We designed Lily from the ground up to run on Apache HBase and Apache Solr. Combining these technologies with our deep marketing segmentation expertise and unique machine learning techniques we’re able to deliver interactive data management, real-time statistical calculations, faceted search views of customers, offers, interactions and the permutations they each inspire.
The following guest post comes from Alejandro Caceres, president and CTO of Hyperion Gray LLC – a small research and development shop focusing on open-source software for cyber security.
Imagine this: You’re an informed citizen, active in local politics, and you decide you want to support your favorite local political candidate. You go to his or her new website and make a donation, providing your bank account information, name, address, and telephone number. Later, you find out that the website was hacked and your bank account and personal information stolen. You’re angry that your information wasn’t better protected — but at whom should your anger be directed?