Category Archives: Books

Meet the Authors: “Data Analytics with Hadoop” from O’Reilly Media

Categories: Books Data Science General Hadoop

I recently had a chat with Benjamin Bengfort, a data scientist finishing his PhD at the University of Maryland, and Jenny Kim, a software engineer at Cloudera, about their forthcoming O’Reilly Media book (now in Early Access), Data Analytics with Hadoop: An Introduction for Data Scientists.

Why did you decide to write this book?

Ben: The content was originally part of a class that Jenny and I were teaching together.

Read More

"Hadoop: The Definitive Guide" is Now a 4th Edition

Categories: Books Hadoop

Apache Hadoop ecosystem, time to celebrate! The much-anticipated, significantly updated 4th edition of Tom White’s classic O’Reilly Media book, Hadoop: The Definitive Guide, is now available.

The Hadoop ecosystem has changed a lot since the 3rd edition. How are those changes reflected in the new edition?

The core of the book is about the core Apache Hadoop project, and since the 3rd edition,

Read More

Advanced Analytics with Apache Spark: The Book

Categories: Books Data Science Events Spark

Authored by a substantial portion of Cloudera’s Data Science team (Sean Owen, Sandy Ryza, Uri Laserson, Josh Wills), Advanced Analytics with Spark (currently in Early Release from O’Reilly Media) is the newest addition to the pipeline of ecosystem books by Cloudera engineers. I talked to the authors recently.

Why did you decide to write this book?

We think it’s mostly to fill a gap between what a lot of people need to know to be productive with large-scale analytics on Apache Hadoop in 2015,

Read More

Here’s Your Getting Started with Impala Book

Categories: Books Impala

Getting Started with Impala (now in early release)—another book in the Hadoop ecosystem books canon—is indispensable for people who want to get familiar with Impala, the open source MPP query engine for Apache Hadoop. We spoke with its author, Impala docs writer John Russell, about the book’s origin and mission.

Why did you decide to write this book?

I wanted to do some long-form tutorials,

The Early Release Books Keep Coming: This Time, Hadoop Security

Categories: Books Security

Hadoop Security is the latest book from Cloudera engineers in the Hadoop ecosystem books canon.

We are thrilled to announce the availability of the early release of Hadoop Security, a new book about security in the Apache Hadoop ecosystem published by O’Reilly Media. The early release contains two chapters on System Architecture and Securing Data Ingest and is available in O’Reilly’s catalog and in Safari Books.