Learn about the new Apache Flume and Apache Kafka integration (aka, “Flafka”) available in CDH 5.8 and its support for the new enterprise features in Kafka 0.9.
Over a year ago, we wrote about the integration of Flume and Kafka (Flafka) for data ingest into Apache Hadoop. Since then, Flafka has proven to be quite popular among CDH users, and we believe that popularity is based on the fact that in Kafka deployments,
Cloudera Enterprise 5.8 includes the latest release of Hue (3.10), the web UI that makes Apache Hadoop easier to use.
As part of Cloudera’s continuing investments in user experience and productivity, Cloudera Enterprise 5.8 includes a new release of Hue that makes several common tasks much easier. In the remainder of this post, we’ll provide a summary of the main improvements. (Hue 3.10 is also available for a quick try in one click on demo.gethue.com.)
New SQL Editor
Hue’s new code editor is a single-page app that is much simpler to use than the previous editor.
This case study is an instructive example of how performance analysis is a multi-faceted process that often leads one in surprising directions.
Apache Solr Near Real Time (NRT) Search allows Solr users to search documents indexed just seconds ago. It’s a critical feature in many real-time analytics applications. As Solr indexes more and more documents in near real time, end-user expectations for performance get higher and higher.
Thanks to new optimizations for running Impala on Amazon S3, doubling cluster size on AWS doubles multi-user performance while keeping total workload cost roughly the same.
With public-cloud deployments becoming increasingly popular, Cloudera is continuing to build out the capabilities of its platform to best take advantage of the cost-effective and flexible nature of the cloud. The current release of Cloudera’s platform (5.8) includes a major step forward in that area with Impala 2.6 able to store and query data directly from the Amazon S3 object store.
Learn how the performance advantages of the Crypto cryptographic library will provide an upgrade for Spark shuffle encryption over the current approach.
When running a big data computing job, the data being processed may contain sensitive information that users don’t want anyone else to access. Encrypting that sensitive data is becoming more and more important, especially for enterprise users.
For Apache Spark, which is the emerging standard for big data processing,