Cloudera Engineering Blog

Big Data best practices, how-to's, and internals from Cloudera Engineering and the community


New in CDH 5.2: Apache Sentry Delegated GRANT and REVOKE

This new feature, jointly developed by Cloudera and Intel engineers, makes management of role-based security much easier in Apache Hive, Impala, and Hue.

Apache Sentry (incubating) provides centralized authorization for services and applications in the Apache Hadoop ecosystem, allowing administrators to set up granular, role-based protection on resources, and to review them in one place. Previously, Sentry only designated administrators to GRANT and REVOKE privileges on an authorizable object. In Apache Sentry 1.5.0 (shipping inside CDH 5.2), we have implemented a new feature (SENTRY-327) that allows admin users to delegate the GRANT privilege to other users using WITH GRANT OPTION. If a user has the GRANT OPTION privilege on a specific resource, the user can now grant the GRANT privilege to other users on the same resource. Apache Hive, Impala, and Hue have all been updated to take advantage of this new Sentry functionality.

New in CDH 5.2: More SQL Functionality and Compatibility for Impala 2.0

Impala 2.0 is the most SQL-complete/SQL-compatible release yet.

As we reported in the most recent roadmap update (“What’s Next for Impala: Focus on Advanced SQL Functionality”), more complete SQL functionality (and better SQL compatibility with other vendor extensions) is a major theme in Impala 2.0.

Introducing Cloudera Labs: An Open Look into Cloudera Engineering R&D

Cloudera Labs contains ecosystem innovations that one day may bring developers more functionality or productivity in CDH.

Since its inception, one of the defining characteristics of Apache Hadoop has been its ability to evolve/reinvent and thrive at the same time. For example, two years ago, nobody could have predicted that the formative MapReduce engine, one of the cornerstones of “original” Hadoop, would be marginalized or even replaced. Yet today, that appears to be happening via Apache Spark, with Hadoop becoming the stronger for it. Similarly, we’ve seen other relatively new components, like Impala, Apache Parquet (incubating), and Apache Sentry (also incubating), become widely adopted in relatively short order.

Cloudera Enterprise 5.2 is Released

Cloudera Enterprise 5.2 contains new functionality for security, cloud deployments, and real-time architectures, and support for the latest open source component releases and partner technologies.

We’re pleased to announce the release of Cloudera Enterprise 5.2 (comprising CDH 5.2, Cloudera Manager 5.2, Cloudera Director 1.0, and Cloudera Navigator 2.1).

How SQOOP-1272 Can Help You Move Big Data from Mainframe to Apache Hadoop

Thanks to M. Asokan, Chief Architect at Syncsort, for the guest post below.

Apache Sqoop provides a framework to move data between HDFS and relational databases in a parallel fashion using Hadoop’s MR framework. As Hadoop becomes more popular in enterprises, there is a growing need to move data from non-relational sources like mainframe datasets to Hadoop. Following are possible reasons for this:

Using Impala, Amazon EMR, and Tableau to Analyze and Visualize Data

Our thanks to AWS Solutions Architect Rahul Bhartia for allowing us to republish his post below.

Apache Hadoop provides a great ecosystem of tools for extracting value from data in various formats and sizes. Originally focused on large-batch processing with tools like MapReduce, Apache Pig, and Apache Hive, Hadoop now provides many tools for running interactive queries on your data, such as Impala, Drill, and Presto. This post shows you how to use Amazon Elastic MapReduce (Amazon EMR) to analyze a data set available on Amazon Simple Storage Service (Amazon S3) and then use Tableau with Impala to visualize the data.

The Definitive "Getting Started" Tutorial for Apache Hadoop + Your Own Demo Cluster

Using this new tutorial alongside Cloudera Live is now the fastest, easiest, and most hands-on way to get started with Hadoop.

At Cloudera, developer enablement is one of our most important objectives. One only has to look at examples from history (Java or SQL, for example) to know that knowledge fuels the ecosystem. That objective is what drives initiatives such as our community forums, the Cloudera QuickStart VM, and this blog itself.

This Month in the Ecosystem (September 2014)

Welcome to our 13th edition of “This Month in the Ecosystem,” a digest of highlights from September 2014 (never intended to be comprehensive; for that, see the excellent Hadoop Weekly).

Here’s Your Getting Started with Impala Book

Getting Started with Impala (now in early release)—another book in the Hadoop ecosystem books canon—is indispensable for people who want to get familiar with Impala, the open source MPP query engine for Apache Hadoop. We spoke with its author, Impala docs writer John Russell, about the book’s origin and mission.

Why did you decide to write this book?

Secrets of Cloudera Support: Using OpenStack to Shorten Time-to-Resolution

Automating the creation of short-lived clusters for testing purposes frees our support engineers to spend more time on customer issues.

The first step for any support engineer is often to replicate the customer’s environment in order to identify the problem or issue. Given the complexity of Cloudera customer environments, reproducing a specific issue is often quite difficult, as a customer’s problem might only surface in an environment with specific versions of Cloudera Enterprise (CDH + Cloudera Manager), configuration settings, certain number of nodes, or the structure of the dataset itself. Even with Cloudera Manager’s awesome setup wizards, setting up Apache Hadoop can be quite time consuming, as the software was never designed with ephemeral clusters in mind.

Older Posts