Scalability of Kafka Messaging using Consumer Groups

Categories: Data Ingestion Flume Kafka Use Case

Traditional messaging models fall into two categories: Shared Message Queues and Publish-Subscribe models. Both models have their own pros and cons. Neither could successfully handle big data ingestion at scale due to limitations in their design. Apache Kafka implements a publish-subscribe messaging model which provides fault tolerance, scalability to handle large volumes of streaming data for real-time analytics. It was developed at LinkedIn in 2010 to meet its growing data pipeline needs. Apache Kafka bridges the gaps that traditional messaging models failed to achieve.

Read more

Backup and Disaster Recovery for Cloudera Search

Categories: CDH Search

One of the worst things that can happen in mission-critical production environments is loss of data and another is downtime. For a search service that provides end users with easy access to data using natural language, downtime would mean complete halt for those parts of your organization. Even worse if the search service is fueling your online business, it interrupts your customer access and end user experience.

That is why we designed multiple options of backup and disaster recovery for your data served via Cloudera Search,

Read more

Automated Provisioning of CDH in the Cloud with Cloudera Director and Ansible

Categories: CDH Cloud Cloudera Director Guest

This is a guest blog post from Jasper Pult, Technology Consultant at Lufthansa Industry Solutionsan international IT consultancy covering all aspects of Big Data, IoT and Cloud.  The below work was implemented using Director’s API v9 and certain API details might change in future versions.

Cloud computing is quickly replacing traditional on premises solutions in all kinds of industries. With Apache Hadoop workloads often varying in resource requirements over time,

Read more

Altus SDK for Java

Categories: Altus

We are excited to announce the general availability of Cloudera Altus SDK for Java to programmatically leverage the Altus platform-as-a service for ETL, batch machine learning, and cloud bursting. Altus empowers customers and partners alike, to run data engineering workloads in the cloud, leveraging cloud infrastructures such as AWS. Cloudera Altus also provides the ability to create data engineering pipelines using both a web console and CLI.

Cloudera Altus SDK for Java was developed to provide easier programmatic access with the popular Java programming language so that users can automate their data engineering workloads.

Read more