Tag Archives: CDH4

Guidelines for Installing CDH Packages on Unsupported Operating Systems

Categories: CDH Cloudera Manager

Installing CDH on newer unsupported operating systems (such as Ubuntu 13.04 and later) can lead to conflicts. These guidelines will help you avoid them.

Some of the more recently released operating systems that bundle portions of the Apache Hadoop stack in their respective distro repositories can conflict with software from Cloudera repositories. Consequently, when you set up CDH for installation on such an OS, you may end up picking up packages with the same name from the OS’s distribution instead of Cloudera’s distribution.

Read more

How-to: Install CDH on Mac OSX 10.9 Mavericks

Categories: CDH General

This overview will cover the basic tarball setup for your Mac.

If you’re an engineer building applications on CDH and becoming familiar with all the rich features for designing the next big solution, it becomes essential to have a native Mac OSX install. Sure, you may argue that your MBP with its four-core, hyper-threaded i7, SSD, 16GB of DDR3 memory are sufficient for spinning up a VM, and in most instances —

Read more

A Guide to Checkpointing in Hadoop

Categories: Hadoop HDFS Ops and DevOps

Understanding how checkpointing works in HDFS can make the difference between a healthy cluster or a failing one.

Checkpointing is an essential part of maintaining and persisting filesystem metadata in HDFS. It’s crucial for efficient NameNode recovery and restart, and is an important indicator of overall cluster health. However, checkpointing can also be a source of confusion for operators of Apache Hadoop clusters.

In this post, I’ll explain the purpose of checkpointing in HDFS,

Read more

Approaches to Backup and Disaster Recovery in HBase

Categories: Hadoop HBase Ops and DevOps

Get an overview of the available mechanisms for backing up data stored in Apache HBase, and how to restore that data in the event of various data recovery/failover scenarios

With increased adoption and integration of HBase into critical business systems, many enterprises need to protect this important business asset by building out robust backup and disaster recovery (BDR) strategies for their HBase clusters. As daunting as it may sound to quickly and easily backup and restore potentially petabytes of data,

Read more

Cascading, Spring, and Spark: Development Choices for CDH Users Expand

Categories: CDH Hadoop Kite SDK

In software development, there is no substitute for having choices. Furthermore, freedom of choice – between frameworks, APIs, and languages — is a major fuel source for platform adoption across any successful ecosystem.

In the case of development on CDH, the open source core of Cloudera’s Big Data platform containing Apache Hadoop and related ecosystem projects, the choices have expanded dramatically in the past three weeks:

  • Spark + CDH

    Cloudera has announced direct support for Apache Spark (incubating) with CDH.

Read more