A month ago, we publicly announced Cloudera Altus, our new platform–as–a–service offering, and today, we are expanding the Altus data engineering service to support AWS EC2 Spot instances. Cloud infrastructure is the most costly component of running Altus data engineering workloads in the cloud. Altus EC2 Spot instance support makes it easy to significantly reduce the cost of cloud infrastructure by allowing users to provision Altus data engineering clusters backed by excess EC2 compute capacity at reduced prices.
An ingest pattern that we commonly see being adopted at Cloudera customers is Apache Spark Streaming applications which read data from Kafka. Streaming data continuously from Kafka has many benefits such as having the capability to gather insights faster. However, users must take into consideration management of Kafka offsets in order to recover their streaming application from failures. In this post, we will provide an overview of Offset Management and following topics.
- Storing offsets in external data stores
- Not managing offsets
Overview of Offset Management
Spark Streaming integration with Kafka allows users to read messages from a single Kafka topic or multiple Kafka topics.
In Part 1 of this blog, we covered some common challenges in memory tuning and baseline setup related to a production Solr deployment. In Part 2, you will learn memory tuning, GC tuning and some best practices.
We assume you have read part 1 of the blog and have a stable Solr deployment up running. The next step is memory tuning to get more out of Solr. Before changing any configuration please be aware that playing with some tuning knobs can cause unexpected consequences on the system,
The Security Problem
Four Letter Words (acronym as 4lw) is a very popular feature of the Apache ZooKeeper project. In a nutshell, 4lw is a set of commands that you can use to interact with a ZooKeeper ensemble through a shell interface. Because it’s simple and easy to use, lots of ZooKeeper monitoring solutions are built on top of 4lw.
The simplicity of 4lw comes at a cost: the design did not originally consider security,
The following article by Ciaran Dynes was reposted from the Talend blog with their permission.
As you may have read, Talend recently announced its support for Cloudera Altus, a newly released Platform-as-a-Service (PaaS) offering that simplifies running large-scale data processing applications in the public cloud. For us, supporting Altus at launch was the absolute easiest decision given that so many of our customers are looking to realize the cost,