Category Archives: Tools

How-to: Use Cascading Pattern with R and CDH

Categories: CDH Data Science Guest Tools

Our thanks to Concurrent Inc. for the how-to below about using Cascading Pattern with CDH. Cloudera recently tested CDH 4.4 with the Cascading Compatibility Test Suite verifying compatibility with Cascading 2.2.

Cascading Pattern is a machine-learning project within the Cascading development framework used to build enterprise data workflows. Cascading provides an abstraction layer on top of Apache Hadoop and other computing topologies that allows enterprises to leverage existing skills and resources to build data processing applications on Hadoop,

Read more

How-to: Configure Eclipse for Hadoop Contributions

Categories: Community Hadoop How-to Tools

Contributing to Apache Hadoop or writing custom pluggable modules requires modifying Hadoop’s source code. While it is perfectly fine to use a text editor to modify Java source, modern IDEs simplify navigation and debugging of large Java projects like Hadoop significantly. Eclipse is a popular choice thanks to its broad user base and multitude of available plugins.

This post covers configuring Eclipse to modify Hadoop’s source. (Developing applications against CDH using Eclipse is covered in a different post.) Hadoop has changed a great deal since our previous post on configuring Eclipse for Hadoop development;

Read more

Cloudera ML: New Open Source Libraries and Tools for Data Scientists

Categories: Community Data Science General Mahout MapReduce Tools

Editor’s note (12/19/2013): Cloudera ML has been merged into the Oryx project. The information below is still valid though.

Last month, Apache Crunch became the fifth project (along with Sqoop, Flume, Bigtop, and MRUnit) to go from Cloudera’s github repository through the Apache Incubator and on to graduate as a top-level project within the Apache Software Foundation. As the founder of the project and a newly minted Apache VP,

Read more

How-to: Automate Your Cluster with Cloudera Manager API

Categories: Cloudera Manager Hadoop How-to MapReduce Ops and DevOps Tools

API access was a new feature introduced in Cloudera Manager 4.0 (download free edition here.). Although not visible in the UI, this feature is very powerful, providing programmatic access to cluster operations (such as configuration and restart) and monitoring information (such as health and metrics). This article walks through an example of setting up a 4-node HDFS and MapReduce cluster via the Cloudera Manager (CM) API.

Cloudera Manager API Basics

The CM API is an HTTP REST API,

Read more

Cloudera Manager 4.0: Customer Feedback and Adoption

Categories: CDH Cloudera Manager General Ops and DevOps Tools

It’s been roughly three months since we announced GA of Cloudera Manager 4.0 (CM4) and I wanted to provide an update on its adoption and feedback from customers.

For those new to it, Cloudera Manager is the first and market-leading management platform for CDH (Cloudera’s Distribution Including Apache Hadoop). Enterprise customers are coming to expect an end-to-end tool that manages the entire lifecycle of their Hadoop operations. In fact,

Read more