Category Archives: Events

Apache HBase is Everywhere

Categories: Community Events HBase

For Cloudera, Apache HBase has grown into a stable, scalable, mature, and critical component of the Apache Hadoop stack.  

HBase adds the ability to do low-latency random read/write across your big data. While it is a key piece of the Apache Hadoop ecosystem, HBase itself has an ecosystem of projects and products that use it as a storage engine for systems such as time series database (OpenTSDB), or SQL-style databases (Apache Phoenix,

Read more

The New Wrangle Conference: Solving the Hardest Data Science Challenges from Startup to Enterprise

Categories: Community Data Science Events

Wrangle, a new conference dedicated to the practice of data science from startup to enterprise, debuts in San Francisco on Oct. 22, 2015.

Even as Cloudera introduce new tools for analytics and machine learning into its platform (like the recently announced Ibis project, for example), we are mindful of the fact that many of the hardest problems in data science cannot be solved by technology alone. From the smallest startups to the largest enterprises,

Read more

Advanced Analytics with Apache Spark: The Book

Categories: Books Data Science Events Spark

Authored by a substantial portion of Cloudera’s Data Science team (Sean Owen, Sandy Ryza, Uri Laserson, Josh Wills), Advanced Analytics with Spark (currently in Early Release from O’Reilly Media) is the newest addition to the pipeline of ecosystem books by Cloudera engineers. I talked to the authors recently.

Why did you decide to write this book?

We think it’s mostly to fill a gap between what a lot of people need to know to be productive with large-scale analytics on Apache Hadoop in 2015,

Read more

Tips for Debugging Distributed Systems

Categories: Events Hadoop

Among Cloudera’s engineer-presenters at Strata + Hadoop World 2013 this week, Philip Zeyliger (“Tricks for Distributed System Debugging and Diagnosis“) was particularly fortunate to have been interviewed by O’Reilly Media editor Meghan Blanchette on camera.

In the following 8-minute interview, Philip offers an overview of common pain points and failures when debugging distributed systems:

And for more detail, you can view his presentation slides here.

Read more

Download the New Impala e-Book from O’Reilly Media

Categories: Events Impala

As a delicious appetizer for the Strata Conference + Hadoop World next week (sold out!), O’Reilly Media has partnered with us to create and publish a new e-book specifically intended for technical end-users of Cloudera Impala, the open source distributed query engine for Apache Hadoop.

Authored by Cloudera’s own John Russell, the e-book provides a 30-page tour of Impala’s internals and architecture, as well as common usage patterns intended for mainstream (SQL) users.

Read more