Category Archives: How-to

Integrating Machine Learning Models into Your Big Data Pipelines in Real-Time With No Coding

Categories: AI and Machine Learning CDH Cloudera Data Science Workbench How-to

[Editor’s note: This article was originally published on the Hortonworks Community Connection, but reproduced here because CDSW is now available on both Cloudera and Hortonworks platforms.]

Using Deployed Models as a Function as a Service

104409 dataengineering 104410 datascience 104431 flowmanagement

Using Cloudera Data Science Workbench with Apache NiFi, we can easily call functions within our deployed models from Apache NiFi as part of flows. I am working against CDSW on HDP (https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_hdp.html), 

Read more

Custom Hostname for Cloud Instances

Categories: Altus CDH Cloud Cloudera Director How-to Ops and DevOps Tools

Cloudera Altus Director provides the simplest way to deploy and manage Cloudera Enterprise in the cloud. It enables customers to unlock the benefits of enterprise-grade Hadoop while leveraging the flexibility, scalability, and affordability of the cloud. It integrates seamlessly with Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure, and provides support to build custom plugins for other public or private cloud environments.

Motivation

While automating the provisioning of a cluster on the cloud using Altus Director,

Read more

Robust Message Serialization in Apache Kafka Using Apache Avro, Part 3

Categories: Avro CDH How-to Kafka

Part 3: Configuring Clients

Earlier, we introduced Kafka Serializers and Deserializers that are capable of writing and reading Kafka records in Avro format. In this part we will going to see how to configure producers and consumers to use them.

Setting up a Kafka Topic for use as a Schema Store

KafkaTopicSchemaProvider works with a Kafka topic as its persistent store. This topic will contain at most thousands of records: the schemas. It does not need multiple partitions,

Read more

Robust Message Serialization in Apache Kafka Using Apache Avro, Part 2

Categories: Avro CDH How-to Kafka

Implementing a Schema Store

In Part 1, we saw the need for an Apache Avro schema provider but did not implement one. In this part we will implement a schema provider that works with Apache Kafka as storage.

In-Memory SchemaStore

First we can implement an in-memory store for schemas. This is useful to understand the requirements for such a store and as the cache of the Kafka backed store. A SchemaStore has to be quick in looking up VersionedSchema entries.

Read more

Robust Message Serialization in Apache Kafka Using Apache Avro, Part 1

Categories: Avro CDH How-to Kafka

In Apache Kafka, Java applications called producers write structured messages to a Kafka cluster (made up of brokers). Similarly, Java applications called consumers read these messages from the same cluster.  In some organizations, there are different groups in charge of writing and managing the producers and consumers. In such cases, one major pain point can be in the coordination of the agreed upon message format between producers and consumers.

This example demonstrates how to use Apache Avro to serialize records that are produced to Apache Kafka while allowing evolution of schemas and nonsynchronous update of producer and consumer applications.

Read more