Cloudera Developer Blog

Big Data best practices, how-to's, and internals from Cloudera Engineering and the community


How-to: Use the ShareLib in Apache Oozie (CDH 5)

The internals of Oozie’s ShareLib have changed recently (reflected in CDH 5.0.0). Here’s what you need to know.

In a previous blog post about one year ago, I explained how to use the Apache Oozie ShareLib in CDH 4. Since that time, things have changed about the ShareLib in CDH 5 (particularly directory structure), so some of the previous information is now obsolete. (These changes went upstream under OOZIE-1619.) 

This Month in the Ecosystem (April 2014)

Welcome to our eighth edition of “This Month in the Ecosystem,” a digest of highlights from April 2014 (never intended to be comprehensive; for completeness, see the excellent Hadoop Weekly).

More good news!

How Apache Hadoop YARN HA Works

Thanks to recent work upstream, YARN is now a highly available service. This post explains its architecture and configuration details.

YARN, the next-generation compute and resource management framework in Apache Hadoop, until recently had a single point of failure: the ResourceManager, which coordinates work in a YARN cluster. With planned (upgrades) or unplanned (node crashes) events, this central service, and YARN itself, could become unavailable.

HBaseCon 2014 is a Wrap!

HBaseCon 2014 is in the books. Thanks to all attendees, speakers, and sponsors!

HBaseCon 2014, much like a butterfly, lived for a short number of hours on Monday — but it certainly was beautiful while it lasted! (See photos here.)

A New Python Client for Impala

The new Python client for Impala will bring smiles to Pythonistas!

As a data scientist, I love using the Python data stack. I also love using Impala to work with very large data sets. But things that take me out of my Python workflow are generally considered hassles; so it’s annoying that my main options for working with Impala are to write shell scripts, use the Impala shell, and/or transfer query results by reading/writing local files to disk.

How-to: Extend Cloudera Manager with Custom Service Descriptors

Thanks to Jonathan Natkins of WibiData for the post below about how his company extended Cloudera Manager to manage Kiji. Learn more about Kiji and the organizations using it to build real-time HBase applications at Kiji Sessions, happening on May 6, 2014, the day after HBaseCon.

As a partner of Cloudera, WibiData sees Cloudera Manager’s new extensibility framework as one of the most exciting parts of Cloudera Enterprise 5. Cloudera Manager 5.0.0 provides the single-pane view that Apache Hadoop administrators and operators want to effectively manage a cluster of machines. Additionally, Cloudera Manager now offers tight integration for partners to plug into the CDH ecosystem, which benefits Cloudera as well as WibiData.

Bringing the Best of Apache Hive 0.13 to CDH Users

More than 300 bug fixes and stable features in Apache Hive 0.13 have already been backported into CDH 5.0.0.

Last week, the Hive community voted to release Hive 0.13. We’re excited about the continued efforts and progress in the project and the latest release — congratulations to all contributors involved!

Using Apache Hadoop and Impala with MySQL for Data Analysis

Thanks to Alexander Rubin of Percona for allowing us to re-publish the post below!

Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from  MySQL to Hadoop, load the data to Cloudera Impala (columnar format), and run reporting on top of that. For the examples below, I will use the “ontime flight performance” data from my previous post.

Meet the Engineer: Andrei Savu

In this installment of “Meet the Engineer”, our subject is Andrei Savu!

What do you do at Cloudera?

Apache Hadoop YARN: Avoiding 6 Time-Consuming "Gotchas"

Understanding some key differences between MR1 and MR2/YARN will make your migration much easier.

Here at Cloudera, we recently finished a push to get Cloudera Enterprise 5 (containing CDH 5.0.0 + Cloudera Manager 5.0.0) out the door along with more than 100 partner certifications.

Newer Posts Older Posts