Category Archives: CDH

Quorum-based Journaling in CDH4.1

Categories: CDH General HDFS

A few weeks back, Cloudera announced CDH 4.1, the latest update release to Cloudera’s Distribution including Apache Hadoop. This is the first release to introduce truly standalone High Availability for the HDFS NameNode, with no dependence on special hardware or external software. This post explains the inner workings of this new feature from a developer’s standpoint. If, instead, you are seeking information on configuring and operating this feature, please refer to the CDH4 High Availability Guide.

Read more

Cloudera Manager 4.1 Now Available; Supports Impala Beta Release

Categories: CDH Cloudera Manager Impala Ops and DevOps

I am very pleased to announce the availability of Cloudera Manager 4.1. This release adds support for the Cloudera Impala beta release, and management and monitoring of key CDH features.

Here are the highlights of Cloudera Manager 4.1:

  • Support for Quorum-based Storage HDFS High Availability
  • Cloudera Impala management and monitoring
  • Flume NG management and monitoring
  • ZooKeeper monitoring
  • Directory disk-space monitoring
  • Host decommissioning
  • Reduced monitoring latency
  • Maintenance mode
  • Several usability,

Read more

Cloudera Impala: Real-Time Queries in Apache Hadoop, For Real

Categories: CDH HBase Hive Impala

After a long period of intense engineering effort and user feedback, we are very pleased, and proud, to announce the Cloudera Impala project. This technology is a revolutionary one for Hadoop users, and we do not take that claim lightly.

When Google published its Dremel paper in 2010, we were as inspired as the rest of the community by the technical vision to bring real-time, ad hoc query capability to Apache Hadoop,

Read more

Cloudera, The Platform for Big Data

Categories: CDH Hadoop Impala

Today we’re proud to announce a new addition to the Apache Hadoop ecosystem: Cloudera Impala, a parallel SQL engine that runs natively on Hadoop storage. The salient points are:

  • Hive compatible
  • 10x the performance of Hive/MapReduce, on average
  • 100% open source, under the Apache License v2 – just like Hadoop
  • Tested to run on CDH4.1 or higher

There’s a blog post that follows mine that provides more details about Impala and how it works.

Read more

Sneak Peek into Skybox Imaging’s Cloudera-powered Satellite System

Categories: CDH Use Case

This is a guest post by Oliver Guinan, VP Ground Software, at Skybox Imaging. Oliver is a 15-year veteran of the internet industry and is responsible for all ground system design, architecture and implementation at Skybox.

One of the great promises of the big data movement is using networks of ubiquitous sensors to deliver insights about the world around us. Skybox Imaging is attempting to do just that for millions of locations across our planet.

Read more