Category Archives: Tools

Quality Assurance at Cloudera: Distributed Unit Testing

Categories: Kudu Testing Tools

Cloudera Engineering has developed (and recently open sourced) a distributed unit testing framework that cuts testing time from multiple hours to just 10 minutes.

Upstream unit tests are Cloudera’s first line of defense for finding and fixing software bugs, as part of a multidimensional process that also includes static/dynamic code analysis, fault injection, integration/scale/endurance testing, and validation on real workloads. However, running a full unit test suite for Apache Hadoop ecosystem components can take hours,

Read more

How-to: Create and Use a Custom Formatter in the Apache HBase Shell

Categories: Avro HBase How-to Tools

Learn how improve Apache HBase usability by creating a custom formatter for viewing binary data types in the HBase shell.

Cloudera customers are looking to store complex data types in Apache HBase to provide fast retrieval of complex information such as banking transactions, web analytics records, and related metadata associated with those records. Serialization formats such as Apache Avro, Thrift, and Protocol Buffers greatly assist in meeting this goal,

Read more

DistCp Performance Improvements in Apache Hadoop

Categories: CDH Hadoop HDFS Performance Tools

Recent improvements to Apache Hadoop’s native backup utility, which are now shipping in CDH, make that process much faster.

DistCp is a popular tool in Apache Hadoop for periodically backing up data across and within clusters. (Each run of DistCp in the backup process is referred to as a backup cycle.) Its popularity has grown in popularity despite relatively slow performance.

In this post, we’ll provide a quick introduction to DistCp.

Read more

How-to: Install and Use Cask Data Application Platform Alongside Impala

Categories: How-to Impala Tools

Cloudera customers can now install, launch, and monitor CDAP directly from Cloudera Manager. This post from Nitin Motgi, Cask CTO, explains how.

Today, Cloudera and Cask are very happy to introduce the integration of Cloudera’s enterprise data hub (EDH) with the Cask Data Application Platform (CDAP). CDAP is an integrated platform for developers and organizations to build, deploy, and manage data applications on Apache Hadoop. This initial integration will enable CDAP to be installed,

Read more

How-to: Create an IntelliJ IDEA Project for Apache Hadoop

Categories: Hadoop How-to Tools

Prefer IntelliJ IDEA over Eclipse? We’ve got you covered: learn how to get ready to contribute to Apache Hadoop via an IntelliJ project.

It’s generally useful to have an IDE at your disposal when you’re developing and debugging code. When I first started working on HDFS, I used Eclipse, but I’ve recently switched to JetBrains’ IntelliJ IDEA (specifically, version 13.1 Community Edition).

My main motivation was the ease of project setup in the face of Maven and Google Protocol Buffers (used in HDFS).

Read more