Category Archives: Ops and DevOps

Inside Apache Oozie HA

Categories: Oozie Ops and DevOps

Oozie’s new HA qualities help cluster operators sleep well at night. Here’s how it works.

One of the big new features in CDH 5 for Apache Oozie is High Availability (HA). In designing this feature, the Oozie team at Cloudera had two main goals: 1) Don’t change the API or usage patterns, and 2) the user shouldn’t even have to know that HA is enabled. In other words, we wanted Oozie HA to be as easy and transparent as possible. 

Read More

A Guide to Checkpointing in Hadoop

Categories: Hadoop HDFS Ops and DevOps

Understanding how checkpointing works in HDFS can make the difference between a healthy cluster or a failing one.

Checkpointing is an essential part of maintaining and persisting filesystem metadata in HDFS. It’s crucial for efficient NameNode recovery and restart, and is an important indicator of overall cluster health. However, checkpointing can also be a source of confusion for operators of Apache Hadoop clusters.

In this post, I’ll explain the purpose of checkpointing in HDFS,

Read More

Secrets of Cloudera Support: Inside Our Own Enterprise Data Hub

Categories: HBase Impala Ops and DevOps Search Support Use Case

Cloudera’s own enterprise data hub is yielding great results for providing world-class customer support.

Here at Cloudera, we are constantly pushing the envelope to give our customers world-class support. One of the cornerstones of this effort is the Cloudera Support Interface (CSI), which we’ve described in prior blog posts (here and here). Through CSI, our support team is able to quickly reason about a customer’s environment,

Read More

Best Practices for Deploying Cloudera Enterprise on Amazon Web Services

Categories: CDH Cloud Ops and DevOps

This FAQ contains answers to the most frequently asked questions about the architecture and configuration choices involved.

In December 2013, Cloudera and Amazon Web Services (AWS) announced a partnership to support Cloudera Enterprise on AWS infrastructure. Along with this announcement, we released a Deployment Reference Architecture Whitepaper. In this post, you’ll get answers to the most frequently asked questions about the architecture and the configuration choices that have been highlighted in that whitepaper.

Read More

Doing DevOps with Cloudera Manager

Categories: Cloudera Manager General Ops and DevOps

More and more customers are using automation/configuration management frameworks alongside Cloudera Manager.

As Apache Hadoop clusters continue to grow in size, complexity, and business importance as the foundational infrastructure for an Enterprise Data Hub, the use cases for a robust and mature management console expand. 

Dev Ops

As those clusters become larger and more complex, many operators are looking to use configuration management/automation frameworks like Ansible,

Read More