Category Archives: CDH

Third-Party Libraries in C6

Categories: CDH General Platform Security & Cybersecurity

Cloudera has put a significant amount of work into upgrading the third-party libraries used in our just-released C6 version. This major upgrade of our software has given us the opportunity to upgrade many of the libraries we use. These upgrades allow us to avoid security vulnerabilities, use modern versions of libraries, and to standardize versions of libraries across CDH.

Modern software development relies on reusing other people’s code. Code reused in this fashion is called a “third-party library.”

Read more

Custom Hostname for Cloud Instances

Categories: Altus CDH Cloud Cloudera Director How-to Ops and DevOps Tools

Cloudera Altus Director provides the simplest way to deploy and manage Cloudera Enterprise in the cloud. It enables customers to unlock the benefits of enterprise-grade Hadoop while leveraging the flexibility, scalability, and affordability of the cloud. It integrates seamlessly with Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure, and provides support to build custom plugins for other public or private cloud environments.

Motivation

While automating the provisioning of a cluster on the cloud using Altus Director,

Read more

Next Generation Data Warehousing at Santander UK

Categories: CDH HBase HDFS Kafka Kudu Use Case

Timely data is crucial to businesses in the Big Data age: This blog post outlines how Santander UK utilises the latest Cloudera technologies and superior software development capability to create the next generation of data warehousing and streaming analytics to support intelligence that can improve relationships with customers and follow the mantra of ‘we want to help people grow and prosper.

Santander UK’s big data journey started around four years ago.

Read more

Robust Message Serialization in Apache Kafka Using Apache Avro, Part 3

Categories: Avro CDH How-to Kafka

Part 3: Configuring Clients

Earlier, we introduced Kafka Serializers and Deserializers that are capable of writing and reading Kafka records in Avro format. In this part we will going to see how to configure producers and consumers to use them.

Setting up a Kafka Topic for use as a Schema Store

KafkaTopicSchemaProvider works with a Kafka topic as its persistent store. This topic will contain at most thousands of records: the schemas. It does not need multiple partitions,

Read more

Robust Message Serialization in Apache Kafka Using Apache Avro, Part 2

Categories: Avro CDH How-to Kafka

Implementing a Schema Store

In Part 1, we saw the need for an Apache Avro schema provider but did not implement one. In this part we will implement a schema provider that works with Apache Kafka as storage.

In-Memory SchemaStore

First we can implement an in-memory store for schemas. This is useful to understand the requirements for such a store and as the cache of the Kafka backed store. A SchemaStore has to be quick in looking up VersionedSchema entries.

Read more