Tag Archives: realtime

Cloudera Impala: Real-Time Queries in Apache Hadoop, For Real

Categories: CDH HBase Hive Impala

After a long period of intense engineering effort and user feedback, we are very pleased, and proud, to announce the Cloudera Impala project. This technology is a revolutionary one for Hadoop users, and we do not take that claim lightly.

When Google published its Dremel paper in 2010, we were as inspired as the rest of the community by the technical vision to bring real-time, ad hoc query capability to Apache Hadoop,

Read more

Apache HBase I/O – HFile

Categories: HBase

Introduction

Apache HBase is the Hadoop open-source, distributed, versioned storage manager well suited for random, realtime read/write access.

Wait wait? random, realtime read/write access?
How is that possible? Is not Hadoop just a sequential read/write, batch processing system?

Yes, we’re talking about the same thing, and in the next few paragraphs, I’m going to explain to  you how HBase achieves the random I/O, how it stores data and the evolution of the HBase’s HFile format.

Read more

Hadoop Graphing with Cacti

Categories: Data Ingestion Guest Hadoop

An important part of making sure Apache Hadoop works well for all users is developing and maintaining strong relationships with the folks who run Hadoop day in and day out. Edward Capriolo keeps About.com’s Hadoop cluster happy, and we frequently chew the fat with Ed on issues ranging from administrative best practices to monitoring. Ed’s been an invaluable resource as we beta test our distribution and chase down bugs before our official releases. Today’s article looks at some of Ed’s tricks for monitoring Hadoop with Cacti through JMX.

Read more