Tag Archives: mrunit

New in CDH 5.3: Apache Sentry Integration with HDFS

Categories: Data Ingestion Platform Security & Cybersecurity Sentry Sqoop

Starting in CDH 5.3, Apache Sentry integration with HDFS saves admins a lot of work by centralizing access control permissions across components that utilize HDFS.

It’s been more than a year and a half since a couple of my colleagues here at Cloudera shipped the first version of Sentry (now Apache Sentry (incubating)). This project filled a huge security gap in the Apache Hadoop ecosystem by bringing truly secure and dependable fine grained authorization to the Hadoop ecosystem and provided out-of-the-box integration for Apache Hive.

Read more

How-to: Test HBase Applications Using Popular Tools

Categories: HBase Testing

While Apache HBase adoption for building end-user applications has skyrocketed, many of those applications (and many apps generally) have not been well-tested. In this post, you’ll learn some of the ways this testing can easily be done.

We will start with unit testing via JUnit, then move on to using Mockito and Apache MRUnit, and then to using an HBase mini-cluster for integration testing. (The HBase codebase itself is tested via a mini-cluster,

Read more

With Sentry, Cloudera Fills Hadoop’s Enterprise Security Gap

Categories: Hadoop Hive Impala Platform Security & Cybersecurity

Every day, more data, users, and applications are accessing ever-larger Apache Hadoop clusters. Although this is good news for data driven organizations overall, for security administrators and compliance officers, there are still lingering questions about how to enable end-users under existing Hadoop infrastructure without compromising security or compliance requirements.

While Hadoop has strong security at the filesystem level, it lacks the granular support needed to adequately secure access to data by users and BI applications.

Read more

Cloudera Development Kit (CDK): Hadoop Application Development Made Easier

Categories: CDH General Hadoop Kite SDK

Editor’s Note (Dec. 11, 2013): As of Dec. 2013, the Cloudera Development Kit is now known as the Kite SDK. Links below are updated accordingly.

At Cloudera, we have the privilege of helping thousands of developers learn Apache Hadoop, as well as build and deploy systems and applications on top of Hadoop. While we (and many of you) believe that platform is fast becoming a staple system in the data center,

Read more

Cloudera ML: New Open Source Libraries and Tools for Data Scientists

Categories: Community Data Science General Mahout MapReduce Tools

Editor’s note (12/19/2013): Cloudera ML has been merged into the Oryx project. The information below is still valid though.

Last month, Apache Crunch became the fifth project (along with Sqoop, Flume, Bigtop, and MRUnit) to go from Cloudera’s github repository through the Apache Incubator and on to graduate as a top-level project within the Apache Software Foundation. As the founder of the project and a newly minted Apache VP,

Read more