Author Archives: Justin Kestelyn

New in Cloudera Enterprise 5.8: Flafka Improvements for Real-Time Data Ingest

Categories: Data Ingestion Flume Kafka

Learn about the new Apache Flume and Apache Kafka integration (aka, “Flafka”) available in CDH 5.8 and its support for the new enterprise features in Kafka 0.9.

Over a year ago, we wrote about the integration of Flume and Kafka (Flafka) for data ingest into Apache Hadoop. Since then, Flafka has proven to be quite popular among CDH users, and we believe that popularity is based on the fact that in Kafka deployments,

Read More

New in Cloudera Enterprise 5.8: SQL Editor and Other Productivity Improvements

Categories: CDH Hue Search Sentry

Cloudera Enterprise 5.8 includes the latest release of Hue (3.10), the web UI that makes Apache Hadoop easier to use.

As part of Cloudera’s continuing investments in user experience and productivity, Cloudera Enterprise 5.8 includes a new release of Hue that makes several common tasks much easier. In the remainder of this post, we’ll provide a summary of the main improvements. (Hue 3.10 is also available for a quick try in one click on demo.gethue.com.)

New SQL Editor

Hue’s new code editor is a single-page app that is much simpler to use than the previous editor.

Read More

Resolving Lock Contention in Apache Solr: A Performance-Analysis Detective Story

Categories: Performance Search Testing

This case study is an instructive example of how performance analysis is a multi-faceted process that often leads one in surprising directions. 

Apache Solr Near Real Time (NRT)  Search allows Solr users to search documents indexed just seconds ago. It’s a critical feature in many real-time analytics applications. As Solr indexes more and more documents in near real time, end-user expectations for performance get higher and higher.

However,

Read More

Analytics and BI on Amazon S3 with Apache Impala (Incubating)

Categories: Cloud Impala Ops and DevOps Performance

Thanks to new optimizations for running Impala on Amazon S3, doubling cluster size on AWS doubles multi-user performance while keeping total workload cost roughly the same.

With public-cloud deployments becoming increasingly popular, Cloudera is continuing to build out the capabilities of its platform to best take advantage of the cost-effective and flexible nature of the cloud. The current release of Cloudera’s platform (5.8) includes a major step forward in that area with Impala 2.6 able to store and query data directly from the Amazon S3 object store.

Read More

Securing Apache Spark Shuffle using Apache Commons Crypto

Categories: Guest Security Spark

Learn how the performance advantages of the Crypto cryptographic library will provide an upgrade for Spark shuffle encryption over the current approach.

When running a big data computing job, the data being processed may contain sensitive information that users don’t want anyone else to access. Encrypting that sensitive data is becoming more and more important, especially for enterprise users.

For Apache Spark, which is the emerging standard for big data processing,

Read More