Category Archives: Books

"Hadoop: The Definitive Guide" is Now a 4th Edition

Categories: Books Hadoop

Apache Hadoop ecosystem, time to celebrate! The much-anticipated, significantly updated 4th edition of Tom White’s classic O’Reilly Media book, Hadoop: The Definitive Guide, is now available.

The Hadoop ecosystem has changed a lot since the 3rd edition. How are those changes reflected in the new edition?

The core of the book is about the core Apache Hadoop project, and since the 3rd edition,

Read More

Advanced Analytics with Apache Spark: The Book

Categories: Books Data Science Events Spark

Authored by a substantial portion of Cloudera’s Data Science team (Sean Owen, Sandy Ryza, Uri Laserson, Josh Wills), Advanced Analytics with Spark (currently in Early Release from O’Reilly Media) is the newest addition to the pipeline of ecosystem books by Cloudera engineers. I talked to the authors recently.

Why did you decide to write this book?

We think it’s mostly to fill a gap between what a lot of people need to know to be productive with large-scale analytics on Apache Hadoop in 2015,

Read More

Here’s Your Getting Started with Impala Book

Categories: Books Impala

Getting Started with Impala (now in early release)—another book in the Hadoop ecosystem books canon—is indispensable for people who want to get familiar with Impala, the open source MPP query engine for Apache Hadoop. We spoke with its author, Impala docs writer John Russell, about the book’s origin and mission.

Why did you decide to write this book?

I wanted to do some long-form tutorials,

The Early Release Books Keep Coming: This Time, Hadoop Security

Categories: Books Project Rhino Security

Hadoop Security is the latest book from Cloudera engineers in the Hadoop ecosystem books canon.

We are thrilled to announce the availability of the early release of Hadoop Security, a new book about security in the Apache Hadoop ecosystem published by O’Reilly Media. The early release contains two chapters on System Architecture and Securing Data Ingest and is available in O’Reilly’s catalog and in Safari Books.

The New Apache Flume Book is in Early Release

Categories: Books Flume

Congratulations to Hari Shreedharan, Cloudera software engineer and Apache Flume committer/PMC member, for the early release of his new O’Reilly Media book, Using Flume: Stream Data into HDFS and HBase. It’s the seventh Hadoop ecosystem book so far that was authored by a current or former Cloudera employee (but who’s counting?).

Why did you decide to write this book?

Flume book

I have been working on Apache Flume for the past two years,

Read More