Cloudera Enterprise 5.5 (comprising CDH 5.5, Cloudera Manager 5.5, and Cloudera Navigator 2.4) has been released.
Cloudera is excited to bring you news of Cloudera Enterprise 5.5. Our persistent emphasis on quality is especially pronounced in this release, with more than 500 issues identified and triaged during its development.
A highlight of this release is the inclusion of Cloudera Navigator Optimizer (available in limited beta for select Cloudera Enterprise customers; learn more here), a new Web-based service based on previous Xplain.io technology that provides ETL consolidation, BI optimization, query de-duplication, and query redesign to make analytic workloads more time- and cost-efficient on Apache Hadoop. As an alpha release, Cloudera Navigator Optimizer has already helped customers optimize more than 1.5 million queries and saved them millions of dollars in the process.
Here are some of the other highlights (see the Release Notes for full lists of features and fixes):
- Column-level security is now provided in Impala and Apache Hive (via Apache Sentry [incubating]).
- In Cloudera Manager-managed clusters, cleartext passwords in configuration files are now encrypted.
- Cloudera Manager includes a new wizard for setting up HDFS encryption, KMS, and Navigator Key Trustee.
- In Cloudera Navigator Encrypt, dmcrypt+loopfile replaces eCryptfs (deprecated) for file encryption.
- LDAP/AD auth is now supported for Apache Solr clients.
- Apache HBase replication is now encrypted.
Performance, Scale, and Operations
- HDFS includes many scalability enhancements, including data-block “flow control” to help optimize DataNode configuration.
- Navigator Encrypt now supports auto-failover to a secondary Key Trustee Server.
- Cloudera Manager now offers a new aggregate UI that provides a single, read-only health dashboard across Cloudera Manager instances.
- HUE HA with a load balancer can now be set up via Cloudera Manager.
- Performance is significantly improved when replicating millions of files, partitions, and petabytes of data with Cloudera Manager Backup and Disaster Recovery.
- Selective service restart now occurs when updating patch parcels.
- New ability to retry CDH upgrades upon failure.
- Kafka now supports rolling restarts.
Data Management and Governance
- Expanded coverage in Cloudera Navigator:
- Extended Apache Hive lineage attributes
- Hive-on-Spark lineage
- HUE audits
- Extended Cloudera Manager audits
- Platform enhancements:
- The new Cloudera Navigator SDK opens up lineage and metadata capabilities for the entire ecosystem.
- Improved self-service data discovery dashboard provides visibility into metadata, schema, and full drill-down into entities.
- Navigator can now publish audit events to Apache Kafka.
- Data stewardship capabilities:
- Automated policy workflows for retention and archiving.
SQL Support & Usability
- Impala now supports querying nested data on Apache Parquet (with support for other file formats like Apache Avro on the roadmap). Learn more here.
- Impala’s robustness and memory efficiency have been improved.
- Spark SQL and the DataFrames API are now supported.
- The majority of Spark MLlib is now supported.
New or Updated Open Source Components
- Apache Spark 1.5 (including Spark SQL, DataFrames API, and MLlib per above)
- Apache Flume 1.6
- Apache Sqoop 1.4.6
- Apache Sentry 1.5.1
- HUE 3.9
- Impala 2.3
New or Updated Platform Support
- RHEL 7
- Amazon S3 storage for Apache Spark and Apache Hive
Over the next few weeks, we’ll publish blog posts that cover some of these features in detail. In the meantime: