Tag Archives: hadoop world

This Month (and Year) in the Ecosystem (December 2013)

Categories: Community Hadoop HBase Impala Spark

Welcome to our sixth edition of “This Month in the Ecosystem,” a digest of highlights from December 2013 (never intended to be comprehensive; for completeness, see the excellent Hadoop Weekly).

With the close of 2013, we also thought it appropriate to include some high points from across the year (not listed in any particular order):

Read more

How-to: Do Statistical Analysis with Impala and R

Categories: Data Science How-to Impala

The new RImpala package brings the speed and interactivity of Impala to queries from R.

Our thanks to Austin Chungath, Sachin Sudarshana, and Vikas Raguttahalli of Mu Sigma, a Decision Sciences and Big Data analytics company, for the guest post below.

As is well known, Apache Hadoop traditionally relies on the MapReduce paradigm for parallel processing, which is an excellent programming model for batch-oriented workloads.

Read more

Approaches to Backup and Disaster Recovery in HBase

Categories: Hadoop HBase Ops and DevOps

Get an overview of the available mechanisms for backing up data stored in Apache HBase, and how to restore that data in the event of various data recovery/failover scenarios

With increased adoption and integration of HBase into critical business systems, many enterprises need to protect this important business asset by building out robust backup and disaster recovery (BDR) strategies for their HBase clusters. As daunting as it may sound to quickly and easily backup and restore potentially petabytes of data,

Read more

Tips for Debugging Distributed Systems

Categories: Events Hadoop

Among Cloudera’s engineer-presenters at Strata + Hadoop World 2013 this week, Philip Zeyliger (“Tricks for Distributed System Debugging and Diagnosis“) was particularly fortunate to have been interviewed by O’Reilly Media editor Meghan Blanchette on camera.

In the following 8-minute interview, Philip offers an overview of common pain points and failures when debugging distributed systems:

And for more detail, you can view his presentation slides here.

Read more

Download the New Impala e-Book from O’Reilly Media

Categories: Events Impala

As a delicious appetizer for the Strata Conference + Hadoop World next week (sold out!), O’Reilly Media has partnered with us to create and publish a new e-book specifically intended for technical end-users of Cloudera Impala, the open source distributed query engine for Apache Hadoop.

Authored by Cloudera’s own John Russell, the e-book provides a 30-page tour of Impala’s internals and architecture, as well as common usage patterns intended for mainstream (SQL) users.

Read more