Category Archives: CDH

Cloudera Enterprise 5.12 is Now Available

Categories: Altus CDH Cloud Cloudera Manager Cloudera Navigator Data Science Hue Impala Kafka Kudu

Cloudera is pleased to announce that Cloudera Enterprise 5.12 is now generally available (GA). The release includes enhancements for running in cloud environments (with broader ADLS support and improved AWS Spot Instance support), usability and productivity improvements for both data science and analytic workloads, as well as performance gains and self-service performance management across a range of workloads.

As usual, there are also a number of quality enhancements, bug fixes, and other improvements across the stack.

Read more

Announcing the 2017 Cloudera Community Champions

Categories: CDH

Please help us congratulatingMatthew Bigelow, Saravanakumar Sivanandan and Gunaranjan Sundararajan , the members of the 2017 Cloudera Community Champions Program.

As the modern platform provider for machine learning and advanced analytics using hybrid open source software, Cloudera understands the value of a thriving community. Cloudera and open source are built upon the very premise of community, and community remains a core tenet of our business. We are proud of our Hadoop ecosystem committers,

Read more

Offset Management For Apache Kafka With Apache Spark Streaming

Categories: CDH Kafka Spark

An ingest pattern that we commonly see being adopted at Cloudera customers is Apache Spark Streaming applications which read data from Kafka. Streaming data continuously from Kafka has many benefits such as having the capability to gather insights faster. However, users must take into consideration management of Kafka offsets in order to recover their streaming application from failures. In this post, we will provide an overview of Offset Management and following topics.

  • Storing offsets in external data stores
    • Checkpoints
    • HBase
    • ZooKeeper
    • Kafka
  • Not managing offsets

Overview of Offset Management

Spark Streaming integration with Kafka allows users to read messages from a single Kafka topic or multiple Kafka topics.

Read more

Solr Memory Tuning for Production (part 2)

Categories: CDH

In Part 1 of this blog, we covered some common challenges in memory tuning and baseline setup related to a production Solr deployment. In Part 2, you will learn memory tuning, GC tuning and some best practices.

Memory Tuning

We assume you have read part 1 of the blog and have a stable Solr deployment up running. The next step is memory tuning to get more out of Solr. Before changing any configuration please be aware that playing with some tuning knobs can cause unexpected consequences on the system,

Read more

Apache Solr Memory Tuning for Production

Categories: CDH HDFS Search

Configuring Apache Solr memory properly is critical for production system stability and performance. It can be hard to find the right balance between competing goals. There are also multiple factors, implicit or explicit, that need to be taken into consideration. This blog talks about some common tasks in memory tuning and guides you through the process to help you understand how to configure Solr memory for a production system.

For simplicity, this blog applies to Solr in Cloudera CDH5.11 running on top of HDFS.

Read more