Tag Archives: CDH4

Guidelines for Installing CDH Packages on Unsupported Operating Systems

Categories: CDH Cloudera Manager

Installing CDH on newer unsupported operating systems (such as Ubuntu 13.04 and later) can lead to conflicts. These guidelines will help you avoid them.

Some of the more recently released operating systems that bundle portions of the Apache Hadoop stack in their respective distro repositories can conflict with software from Cloudera repositories. Consequently, when you set up CDH for installation on such an OS, you may end up picking up packages with the same name from the OS’s distribution instead of Cloudera’s distribution.

Read More

How-to: Install CDH on Mac OSX 10.9 Mavericks

Categories: CDH General

This overview will cover the basic tarball setup for your Mac.

If you’re an engineer building applications on CDH and becoming familiar with all the rich features for designing the next big solution, it becomes essential to have a native Mac OSX install. Sure, you may argue that your MBP with its four-core, hyper-threaded i7, SSD, 16GB of DDR3 memory are sufficient for spinning up a VM, and in most instances —

Read More

What’s New in Kite SDK 0.15.0?

Categories: Kite SDK Tools

Kite SDK’s new release contains new improvements that make working with data easier.

Recently, Kite SDK, the open source toolset that helps developers build systems on the Apache Hadoop ecosystem, became a 0.15.0. In this post, you’ll get an overview of several new features and bug fixes.

Working with Datasets by URI

The new Datasets class lets you work with datasets based on individual dataset URIs.

Read More

A Guide to Checkpointing in Hadoop

Categories: Hadoop HDFS Ops and DevOps

Understanding how checkpointing works in HDFS can make the difference between a healthy cluster or a failing one.

Checkpointing is an essential part of maintaining and persisting filesystem metadata in HDFS. It’s crucial for efficient NameNode recovery and restart, and is an important indicator of overall cluster health. However, checkpointing can also be a source of confusion for operators of Apache Hadoop clusters.

In this post, I’ll explain the purpose of checkpointing in HDFS,

Read More

Approaches to Backup and Disaster Recovery in HBase

Categories: Hadoop HBase Ops and DevOps

Get an overview of the available mechanisms for backing up data stored in Apache HBase, and how to restore that data in the event of various data recovery/failover scenarios

With increased adoption and integration of HBase into critical business systems, many enterprises need to protect this important business asset by building out robust backup and disaster recovery (BDR) strategies for their HBase clusters. As daunting as it may sound to quickly and easily backup and restore potentially petabytes of data,

Read More