Category Archives: Testing

How Testing Supports Production-Ready Security in Cloudera Search

Categories: Search Security Sentry Testing

Security architecture is complex, but these testing strategies help Cloudera customers rely on production-ready results.

Among other things, good security requires user authentication and that authenticated users and services be granted access to those things (and only those things) that they’re authorized to use. Across Apache Hadoop and Apache Solr (which ships in CDH and powers Cloudera Search), authentication is accomplished using Kerberos and SPNego over HTTP and authorization is accomplished using Apache Sentry (the emerging standard for role-based fine grain access control,

Read More

How-to: Test HBase Applications Using Popular Tools

Categories: HBase Testing

While Apache HBase adoption for building end-user applications has skyrocketed, many of those applications (and many apps generally) have not been well-tested. In this post, you’ll learn some of the ways this testing can easily be done.

We will start with unit testing via JUnit, then move on to using Mockito and Apache MRUnit, and then to using an HBase mini-cluster for integration testing. (The HBase codebase itself is tested via a mini-cluster,

Read More

How Cloudera Ensures HBase Client API Compatibility in CDH

Categories: CDH HBase Testing

Apache HBase supports three primary client APIs that developers can use to bind applications with HBase: the Java API, the REST API, and the Thrift API. Therefore, as developers build apps against HBase, it’s very important for them to be aware of the compatibility guidelines with respect to CDH.

This blog post will describe the efforts that go into protecting the experience of a developer using the Java API. Through its testing work,

Read More

What Do Real-Life Apache Hadoop Workloads Look Like?

Categories: CDH Hadoop HBase HDFS Hive MapReduce Oozie Ops and DevOps Pig Testing Use Case

Organizations in diverse industries have adopted Apache Hadoop-based systems for large-scale data processing. As a leading force in Hadoop development with customers in half of the Fortune 50 companies, Cloudera is in a unique position to characterize and compare real-life Hadoop workloads. Such insights are essential as developers, data scientists, and decision makers reflect on current use cases to anticipate technology trends.

Recently we collaborated with researchers at UC Berkeley to collect and analyze a set of Hadoop traces.

Read More

Watching the Clock: Cloudera’s Response to Leap Second Troubles

Categories: CDH Cloudera Manager Community General Hadoop Support Testing

At 5 pm PDT on June 30, a leap second was added to the Universal Coordinated Time (UTC). Within an hour, Cloudera Support started receiving reports of systems running at 100% CPU utilization. The Support Team worked quickly to understand and diagnose the problem and soon published a solution. Bugs due to the leap second coupled with the Amazon Web Services outage would make this Cloudera’s busiest support weekend to date.

Since Hadoop is written in Java and closely interoperates with the underlying OS,

Read More