Category Archives: How-to

Custom Hostname for Cloud Instances

Categories: Altus CDH Cloud Cloudera Director How-to Ops and DevOps Tools

Cloudera Altus Director provides the simplest way to deploy and manage Cloudera Enterprise in the cloud. It enables customers to unlock the benefits of enterprise-grade Hadoop while leveraging the flexibility, scalability, and affordability of the cloud. It integrates seamlessly with Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure, and provides support to build custom plugins for other public or private cloud environments.

Motivation

While automating the provisioning of a cluster on the cloud using Altus Director,

Read more

Robust Message Serialization in Apache Kafka Using Apache Avro, Part 3

Categories: Avro CDH How-to Kafka

Part 3: Configuring Clients

Earlier, we introduced Kafka Serializers and Deserializers that are capable of writing and reading Kafka records in Avro format. In this part we will going to see how to configure producers and consumers to use them.

Setting up a Kafka Topic for use as a Schema Store

KafkaTopicSchemaProvider works with a Kafka topic as its persistent store. This topic will contain at most thousands of records: the schemas. It does not need multiple partitions,

Read more

Robust Message Serialization in Apache Kafka Using Apache Avro, Part 2

Categories: Avro CDH How-to Kafka

Implementing a Schema Store

In Part 1, we saw the need for an Apache Avro schema provider but did not implement one. In this part we will implement a schema provider that works with Apache Kafka as storage.

In-Memory SchemaStore

First we can implement an in-memory store for schemas. This is useful to understand the requirements for such a store and as the cache of the Kafka backed store. A SchemaStore has to be quick in looking up VersionedSchema entries.

Read more

Robust Message Serialization in Apache Kafka Using Apache Avro, Part 1

Categories: Avro CDH How-to Kafka

In Apache Kafka, Java applications called producers write structured messages to a Kafka cluster (made up of brokers). Similarly, Java applications called consumers read these messages from the same cluster.  In some organizations, there are different groups in charge of writing and managing the producers and consumers. In such cases, one major pain point can be in the coordination of the agreed upon message format between producers and consumers.

This example demonstrates how to use Apache Avro to serialize records that are produced to Apache Kafka while allowing evolution of schemas and nonsynchronous update of producer and consumer applications.

Read more

Evaluating Partner Platforms

Categories: CDH Hardware How-to Performance

As a member of Cloudera’s Partner Engineering team, I evaluate hardware and cloud computing platforms offered by commercial partners who want to certify their products for use with Cloudera software. One of my primary goals is to make sure that these platforms provide a stable and well-performing base upon which our products will run, a state of operation that a wide variety of customers performing an even wider variety of tasks can appreciate.

Read more