Category Archives: Ops and DevOps

Big Data Benchmarks: Toward Real-Life Use Cases

Categories: Guest Hadoop Ops and DevOps Performance

The Transaction Processing Council (TPC), working with Cloudera, recently announced the new TPCx-HS benchmark, a good first step toward providing a Big Data benchmark.

In this interview by Roberto Zicari with Francois Raab, the original author of the TPC-C Benchmark, and Yanpei Chen, a Performance Engineer at Cloudera, the interviewees share their thoughts on the next step for benchmarks that reflect real-world use cases.

This interview was originally published at ODBMS.org;

Read More

How-to: Easily Do Rolling Upgrades with Cloudera Manager

Categories: Cloudera Manager How-to Ops and DevOps

Unique across all options, Cloudera Manager makes it easy to do what would otherwise be a disruptive operation for operators and users.

For the increasing number of customers that rely on enterprise data hubs (EDHs) for business-critical applications, it is imperative to minimize or eliminate downtime — thus, Cloudera has focused intently on making software upgrades a routine, non-disruptive operation for EDH administrators and users.

With Cloudera Manager 4.6 and later,

Read More

Inside Apache Oozie HA

Categories: Oozie Ops and DevOps

Oozie’s new HA qualities help cluster operators sleep well at night. Here’s how it works.

One of the big new features in CDH 5 for Apache Oozie is High Availability (HA). In designing this feature, the Oozie team at Cloudera had two main goals: 1) Don’t change the API or usage patterns, and 2) the user shouldn’t even have to know that HA is enabled. In other words, we wanted Oozie HA to be as easy and transparent as possible. 

Read More

A Guide to Checkpointing in Hadoop

Categories: Hadoop HDFS Ops and DevOps

Understanding how checkpointing works in HDFS can make the difference between a healthy cluster or a failing one.

Checkpointing is an essential part of maintaining and persisting filesystem metadata in HDFS. It’s crucial for efficient NameNode recovery and restart, and is an important indicator of overall cluster health. However, checkpointing can also be a source of confusion for operators of Apache Hadoop clusters.

In this post, I’ll explain the purpose of checkpointing in HDFS,

Read More

Secrets of Cloudera Support: Inside Our Own Enterprise Data Hub

Categories: HBase Impala Ops and DevOps Search Support Use Case

Cloudera’s own enterprise data hub is yielding great results for providing world-class customer support.

Here at Cloudera, we are constantly pushing the envelope to give our customers world-class support. One of the cornerstones of this effort is the Cloudera Support Interface (CSI), which we’ve described in prior blog posts (here and here). Through CSI, our support team is able to quickly reason about a customer’s environment,

Read More