With the launch of CDP Public Cloud 7.2.14, Cloudera Streams Messaging for Data Hub deployments has gotten some powerful new features! In this release, the Streams Messaging templates in Data Hub will come with Apache Kafka 2.8 and Cruise Control 2.5 providing new core features and fixes. KConnect has been added and gains additional capabilities with new connectors and Stateless Apache NiFi capabilities which can run NiFi Flows as connectors. The Schema Registry will now support JSON schemas in addition to the Apache Avro schemas already supported and will gain the ability to perform native API based import and export to share schemas between environments.
Kafka & Cruise Control Updates
Deployments with Kafka 2.5 clusters can now be upgraded to Kafka 2.8, benefitting from all the improvements and features from Kafka 2.6, 27 and 2.8. Improvements include:
- Kafka Client Quota API for the Admin Client making it easier to map and manage quotas with the new kafka-client-quotas tool.
- Better monitoring and debugging performance issues by exposing disk read and write metrics.
- Connection limiting for Kafka Brokers is now possible which can help protect them from CPU overrun issues and other connection storm related problems (e.g. incorrectly implemented clients that keep disconnecting and reconnecting per message). This feature allows for the total number of connections to be set at the broker level, or limit the total connections allowed from a specific IP address.
This is just a small sample of all the new improvements that are now available in the latest Cloudera Streams Messaging update in the 7.2.14 release.
Cruise Control Updates
Cruise Control when upgraded from 2.2 to 2.5 a number of fixes and a new rebalance goal become available. Prior to this release only the RackAwareGoal was available, which provided a strict enforcement of replica placement based on rack topologies.This meant that a replica would never be assigned to a rack if it already contained another replica from the same partition. In clusters where the number of racks was lower than a partition replication factor, this would prevent unavailable replicas from being restored for use until a rack failure was repaired. In Cruise Control 2.5, the RackAwareDistributionGoal allows for relaxed placement of partition replicas across racks evenly, allowing for multiple replicas of the same partition to be placed on the same rack if all other available racks already contain replicas. With this, Cruise Control can restore availability of all replicas even in a situation where a rack failure causes the number of available racks to be lower than a partition’s replication factor.
KConnect is an amazing component in the Kafka stack which allows for simple ingress and egress of data from a Kafka cluster. Prior to 7.2.14 this component was not available in the Public Cloud Streams Messaging deployments and only part of our on-premise releases. Now, users of Cloudera Streams Messaging can access this component in the public cloud as a Technical Preview! Beyond the addition of this new core component are additional features and enhancements to KConnect, from enterprise-grade security improvements, to new out-of-the-box connectors
Two of these new connectors, the NiFi Stateless source and sink connectors, enable stateless NiFi flows to be directly deployed in KConnect, which provides very powerful and flexible capabilities.
Newly created 7.2.14 deployments can resize their cluster to deploy KConnect workers.
The security around KConnect has been enhanced to meet the common needs of enterprises. All REST APIs now implement authentication and authorization controls. Permissions for common operations like deploying connectors, viewing connectors, and modifying connectors can be set up both at a cluster level and for individual connector deployments. Below is an example of a policy in Apache Ranger that will allow a user to view all deployed and running connectors but not modify them.
New KConnect Connectors
Additional connectors and NiFi Stateless support has also been added. The below connectors are now available as tiles in Streams Messaging Manager. These add to the already available S3 and HDFS sink connectors. More connectors will continue to be added in future releases.
|Stateless NiFi Source||Stateless NiFi Sink|
|Syslog TCP & UDP||File Stream|
NiFi Stateless with KConnect
The Stateless NiFi Source and Sink connector allow you to run in the KConnect cluster data flows that were designed in NiFi. This functionality allows you to leverage KConnect for scalability and High Availability. By being able to use NiFi to build a connector, the large number of NiFi processors can now be leveraged to implement ingress and egress use cases without writing code. This is great for a number of use cases where an out of the box connector may not be able to meet the functional requirements. For example, filtering messages on a keyword and then converting many messages into a sequence file, then putting that sequence file onto S3 can be easily and quickly built in NiFi and then configured to run in your existing KConnect infrastructure. Stay tuned for a blog focused on NiFi Stateless and the powerful capabilities it brings to KConnect.
JSON schemas are now supported in the Schema Registry. This allows users to define schemas for workloads that were not utilizing Avro but used JSON messages. As a data format, JSON has grown massively over the last decade. Question rates on Stackoverflow show JSON overtaking XML, SOAP and CSV around 2013, making it one of the most popular formats for developers. Today many new applications start with JSON first and we find that the other formats, like xml, soap, and csv, are mostly used by legacy solutions. By default, JSON schemas added to the Schema Registry will utilize JSON Schema Draft-07 specification, but an override option is provided allowing for a $schema field to be set with an alternate draft version, allowing older schemas or newer schemas not compatible with draft 7 to be created.
Schema Registry Import & Export
Schemas from the Schema Registry can now be exported as a JSON file. This JSON file can then be imported into another Schema Registry via the native REST API. Prior to this, replicating schemas between Schema Registry deployments meant exporting/importing the Schema Registry database or setting up database level replication at the infrastructure level. This prevented sharing schemas between deployments that utilized differing backend databases. With the native API, deployments can export, import and merge schemas across deployments utilizing many different backends without constraints based on infrastructure. Because all schemas are assigned a specific schema ID, the ability to define the ID range used by each Schema Registry deployment is important to avoid ID collisions when entries from one registry are imported into another one. By configuring different ID ranges for each Schema Registry deployment it is possible to allow schema authorship for all deployments and not just a single registry that acts as the primary.
In this blog, we looked at some of the new features that came out in CDP Public Cloud 7.2.14. This included upgrades to Kafka to 2.8 which improves client quota usability, monitoring improvements, and connection rate limiting options. Cruise Control has been upgraded to 2.5 which provides a number of fixes and a relaxed rack awareness goal. KConnect’s inclusion as Technical Preview in the Cloudera Public Cloud comes with new out of the box processors, support for NiFi Stateless processors, and Ranger security policy management. Finally, the Schema Registry has been enhanced with JSON schema support allowing for applications that don’t utilize Avro to benefit from centralized schema management and enabling native support for the importing and exporting of schemas to allow for copying of schemas across registry deployments.
Give Cloudera Streams Messaging 7.2.14 for Datahub a try today and check out all the greatest new features added!