Category Archives: Cloudera Manager

Cloudera Enterprise 5.10 is Now Available

Categories: CDH Cloud Cloudera Manager Cloudera Navigator Hadoop Hue Kudu

Cloudera is proud to announce that Cloudera Enterprise 5.10 is now generally available (GA). The highlights of this release include the GA of the new columnar storage engine Apache Kudu, improved cloud performance and cost-optimizations, and cloud-native data governance for Amazon S3.

As usual, there are also a number of quality enhancements and bug fixes (learn more about our multi-dimensional hardening/QA process) and other improvements across the stack. Here is a partial list of what’s included (see the Release Notes for a full list):

  • GA of Apache Kudu

Read more

How-to: Automate Your sparklyr Environment with Cloudera Director

Categories: Cloudera Manager Data Science Hadoop How-to Ops and DevOps Spark

Since the launch of sparklyr, working with Apache Spark in Apache Hadoop has become much easier for R users. sparklyr contains a dplyr interface into Spark and allows users to leverage crucial machine learning algorithms from Spark MLlib and H2O Sparkling Water. This greatly reduces the barrier of entry for R users in adopting Spark as a tool for big data and should go a long way in enabling R workloads to migrate to Hadoop.

Read more

Resource Management for Apache Impala (incubating)

Categories: CDH Cloudera Manager Hadoop Impala Ops and DevOps Use Case

Apache Impala (incubating) includes several features that allow you to restrict or allocate resources so as to maximize stability and performance for your Impala workloads. You can limit both CPU and memory resources used by Impala to manage and prioritize jobs on CDH clusters. This blog post describes the techniques a typical Impala deployment can use to manage its resources.

Static Service Pools

Static service pools isolate services from one another, so that a high load on one service has limited impact on other services.

Read more

What’s New in Cloudera Director 2.2?

Categories: CDH Cloud Cloudera Manager Hadoop

This new release adds support for Amazon EBS volumes and the ability to diagnose cluster bootstrap errors quickly.

Cloudera Director provides a simple, reliable, enterprise-grade way to deploy, scale, and manage Apache Hadoop in the cloud of your choice. Cloudera Director enables you to deploy production-ready clusters for big data applications and successfully run workloads in the cloud.

Cloudera Director makes it easier for customers to:

  • Deploy clusters in line with patterns native to cloud infrastructure
  • Use an interface to define in one place the desired cluster specification all the way down to the operating system
  • Repeatedly and programmatically instantiate these cluster definitions
  • Adapt to the dynamic nature of cloud infrastructure

Cloudera Director 2.2 provides additional mechanisms to get that initial cluster definition right and the ability to diagnose errors and iterate quickly.

Read more

Considerations for Production Environments Running Cloudera Backup and Disaster Recovery for Apache Hive and HDFS

Categories: Cloudera Manager Ops and DevOps

Learn how replication functionality for Apache Hive metadata and consistency benefits from automated HDFS snapshots benefit production environments.

A robust backup solution that is both correct and efficient is necessary for all production data management systems. Backup and Disaster Recovery (BDR) is a Cloudera Manager feature in Cloudera Enterprise that allows for consistent, efficient replication and version management of data in CDH clusters. Cloudera Enterprise BDR can be used for creating efficient incremental backups of HDFS and Hive data from multiple clusters,

Read more