Category Archives: General

This Month in the Ecosystem (July 2014)

Categories: General

Welcome to our 11th edition of “This Month in the Ecosystem,” a digest of highlights from July 2014 (never intended to be comprehensive; for that, see the excellent Hadoop Weekly).

  • An early release of the new O’Reilly Media book, Hadoop Application Architectures, became available. This one is sure to become standard bookshelf material. (Look for signed copies at Strata + Hadoop World!)
  • Continuuity introduced Tephra,

Read More

Cloudera Live: The Instant Apache Hadoop Experience

Categories: CDH Cloud General Hue

Get started with Apache Hadoop and use-case examples online in just seconds.

Today, we announced the Cloudera Live Read-Only Demo, a new online service for developers and analysts (currently in public beta) that makes it easy to learn, explore, and try out CDH, Cloudera’s open source software distribution containing Apache Hadoop and related projects. No downloads, no installations, no waiting — just point-and-play!


Try the Cloudera Live Read-Only Demo

The Cloudera Live Read-Only Demo is a live CDH 5 cluster with a Hue interface (based on Hue 3.5.0,

Read More

How-to: Implement Role-based Security in Impala using Apache Sentry

Categories: General Hive How-to Impala Security

This quick demo illustrates how easy it is to implement role-based access and control in Impala using Sentry.

Apache Sentry (incubating) is the Apache Hadoop ecosystem tool for role-based access control (RBAC). In this how-to, I will demonstrate how to implement Sentry for RBAC in Impala. I feel this introduction is best motivated by a use case.

Data warehouse optimization is one of the most common Hadoop use cases.

Read More

Apache Spark: A Delight for Developers

Categories: General Spark

Sure, Spark is fast, but it also gives developers a positive experience they won’t soon forget.

Apache Spark is well known today for its performance benefits over MapReduce, as well as its versatility. However, another important benefit – the elegance of the development experience – gets less mainstream attention.

In this post, you’ll learn just a few of the features in Spark that make development purely a pleasure.

Read More

How-to: Index and Search Multilingual Documents in Hadoop

Categories: General Guest Search

Learn how to use Cloudera Search along with RBL-JE to search and index documents in multiple languages.

Our thanks to Basis Technology for providing the how-to below!

Basis Technology’s Rosette Base Linguistics for Java (RBL-JE) provides a comprehensive multilingual text analytics platform for improving search precision and recall. RBL provides tokenization, lemmatization, POS tagging, and de-compounding for Asian, European, Nordic, and Middle Eastern languages, and has just been certified for use with Cloudera Search.

Read More