Author Archives: Mark Grover

Reading data securely from Apache Kafka to Apache Spark

Categories: CDH Kafka Platform Security & Cybersecurity Sentry Spark

Introduction

With an ever-increasing number of IoT use cases on the CDH platform, security for such workloads is of paramount importance. This blog post describes how one can consume data from Kafka in Spark, two critical components for IoT use cases, in a secure manner.

The Cloudera Distribution of Apache Kafka 2.0.0 (based on Apache Kafka 0.9.0) introduced a new Kafka consumer API that allowed consumers to read data from a secure Kafka cluster.

Read More

Apache Spark 2.0 Beta Now Available for CDH

Categories: Hadoop Spark

Today, Cloudera announced the availability of an Apache Spark 2.0 Beta release for users of the Cloudera platform.

Apache Spark 2.0 is tremendously exciting (read this post for more background) because (among other things):

  • The Dataset API further enhances Spark’s claim as the best tool for data engineering by providing compile-time type safety along with the benefits of a query-optimization engine.
  • The Structured Streaming API enables the modeling of streaming data as a continuous DataFrame and expresses operations on that data with a SQL-like API.

Read More

Apache Spot (Incubating): Fighting Cyber Threats via an Open Data Model

Categories: Hadoop Platform Security & Cybersecurity Use Case

Last week, the open source Open Network Insights (ONI) project, now called Spot, was accepted into the ASF Incubator. Here are the highlights about its open data model approach and initial use cases.

One of the biggest challenges organizations face today in combating cyber threats is collecting and normalizing data from numerous security event data sources (often up to thousands of them) to build the required analytics. This process often results in those analytics becoming dependent upon specific technologies for detecting threats,

Read More

Guidelines for Installing CDH Packages on Unsupported Operating Systems

Categories: CDH Cloudera Manager

Installing CDH on newer unsupported operating systems (such as Ubuntu 13.04 and later) can lead to conflicts. These guidelines will help you avoid them.

Some of the more recently released operating systems that bundle portions of the Apache Hadoop stack in their respective distro repositories can conflict with software from Cloudera repositories. Consequently, when you set up CDH for installation on such an OS, you may end up picking up packages with the same name from the OS’s distribution instead of Cloudera’s distribution.

Read More

The New Hadoop Application Architectures Book is Here!

Categories: Books Hadoop

There’s an important new addition coming to the Apache Hadoop book ecosystem. It’s now in early release!

We are very happy to announce that the new Apache Hadoop book we have been writing for O’Reilly Media, Hadoop Application Architectures, is now available as an early release! It contains the first two chapters and can be found in O’Reilly’s Catalog and via Safari.        

The goal of this book is to give developers and architects guidance on architecting end-to-end solutions using Hadoop and tools in the ecosystem.