We are pleased to announce the general availability of Cloudera Enterprise 6.1.0, the modern platform for machine learning and analytics optimized for the cloud. This release delivers several new capabilities, improved usability, and better performance.
As usual, the release includes a number of quality enhancements, bug fixes, and other improvements across the stack. Here is a partial list of what’s included (see the Release Notes for a full list):
Cloudera Enterprise 6.1 now supports Spark Structured Streaming and enables micro-batch processing at ~100ms increments enabling ingest to query latencies in the Cloudera platform measured in seconds.
Structured Streaming simplifies the traditional Spark Streaming functionality by bringing the same SparkSession API (DataFrames and Datasets API) to streaming data enabling the same code to be used across batch and stream processing with a common SQL-like API.
Users streaming from Spark into Kudu and querying the data with Impala will now be able to analyze data in real-time – measured in end-to-end pipeline latency of seconds. For users that want to make post-processed data available to applications and pushed into Kudu can push the data into Kafka bus and leverage the Flume sink to push the data into Kudu.
When using Spark Structured Streaming with HBase, you can make data available to your business applications with single-second latency providing your applications an unprecedented ability to respond to real-time data.
Users that don’t require data to be processed before ingestion can now stream data directly into Kudu from Kafka + Flume without writing a single line of code.
HDFS Erasure coding is now supported. Starting with Cloudera Enterprise 6.1, Hive, MapReduce, Spark, BDR, and Navigator will be able to interact with Erasure Coded data. HDFS Erasure coding uses up to 50% less disk rather than 3x replication without compromising resiliency.
For customers that have deployed Cloudera Enterprise on Microsoft Azure, we are announcing the public preview of Azure Data Lake Storage Gen 2, enabling better storage performance on Azure at a lower cost. Cloudera Enterprise 6.1 will support running Hive, MapReduce, Impala, and Spark workloads on ADLS Gen 2 for development and testing purposes.
To date, Kafka required RAID compliant storage. It now supports JBOD storage enabling customers to use cheaper disk and reduce their cost of storage by approximately 50%.
We’ve also rebased Apache Accumulo to version 1.9.2, which results in numerous improvements, including better scanning performance and rate limits on compaction.
Data Warehouse & SQL
Cloudera continues to expand SQL support with Impala. Cloudera Enterprise 6.1 includes support for ‘exact multiple COUNT(DISTINCT <expr>)’ within a single query allowing more complex data warehouse queries to be run. This reduces complexity for post-query data processing and analytics.
Our easy-to-use SQL-workbench, Hue, continues down the path of simplifying the self-serve, interactive SQL experience. This release has focused on making it easier for users to get to the right data faster by adding valuable popularity hints (available if you have Navigator enabled) and a more task-optimized and self-guided data browser experience. In addition, we’ve added a new way to visualize query queuing, which helps users identify when clusters are busy, hence preventing the frustration associated with re-sending queries repeatedly to already busy query services.
Platform Support & Security
Cloudera now supports deploying with OpenJDK 8 in addition to Oracle’s JDK. With this release, we also support AWS CloudHSM for HDFS encryption-at-rest.
As customers are increasingly implementing security, we are changing defaults to be secure in order to reduce setup complexity and configuration misses. As a part of this release, several defaults in Kafka, Impala, Sqoop, and Flume have been changed to be more secure and added BDR replication from insecure to secure (Kerberized) clusters to ease the transition to secure clusters.
Cloudera continues to enhance our support for clouds. With this release, Cloudera Altus Director now has enhanced proxy support for secure environments, enhanced scripting capabilities including pre-termination scripts, and Google Cloud sub-networks support.
What’s in this release:
- Cloudera Enterprise 6.1.0 comprising
- CDH 6.1.0
- Cloudera Manager 6.1.0
- Cloudera Navigator 6.1.0
- Cloudera Altus Director 6.1.0
- Cloudera Navigator Encryption 6.1.0
- Cloudera Navigator Key Trustee 6.1.0
- Cloudera Navigator Key Trustee KMS 6.1.0
- Cloudera Navigator Encrypt 6.1.0
- Cloudera Navigator Optimizer Updates
- Apache Accumulo 1.9.2
Please refer to the release notes for a complete list what is new in C6.1.0. We also encourage you to review the revised Upgrade Guide that now has the ability to create customized documentation based on your unique upgrade path.
Krishna Maheshwari is a Director of Product Management at Cloudera.