Category Archives: Ops and DevOps

Considerations for Production Environments Running Cloudera Backup and Disaster Recovery for Apache Hive and HDFS

Categories: Cloudera Manager Ops and DevOps

Learn how replication functionality for Apache Hive metadata and consistency benefits from automated HDFS snapshots benefit production environments.

A robust backup solution that is both correct and efficient is necessary for all production data management systems. Backup and Disaster Recovery (BDR) is a Cloudera Manager feature in Cloudera Enterprise that allows for consistent, efficient replication and version management of data in CDH clusters. Cloudera Enterprise BDR can be used for creating efficient incremental backups of HDFS and Hive data from multiple clusters,

Read More

Analytics and BI on Amazon S3 with Apache Impala (Incubating)

Categories: Cloud Impala Ops and DevOps Performance

Thanks to new optimizations for running Impala on Amazon S3, doubling cluster size on AWS doubles multi-user performance while keeping total workload cost roughly the same.

With public-cloud deployments becoming increasingly popular, Cloudera is continuing to build out the capabilities of its platform to best take advantage of the cost-effective and flexible nature of the cloud. The current release of Cloudera’s platform (5.8) includes a major step forward in that area with Impala 2.6 able to store and query data directly from the Amazon S3 object store.

Read More

New in Cloudera Manager 5.7: Cluster Templates

Categories: Cloudera Manager Ops and DevOps

The new cluster templates feature in Cloudera Manager 5.7 makes creating clusters faster and easier.

Often, after an Apache Hadoop cluster has been configured correctly, its admin will want to replicate the configuration in one or more clusters—whether for promoting a dev or staging cluster to production, or setting up a new production cluster with the same configuration as an existing one.

For Cloudera customers, until recently the process for replicating cluster configurations was manual and error-prone.

Read More

How-to: Deploy a Secure Enterprise Data Hub on AWS

Categories: CDH Cloud How-to Ops and DevOps

Learn how to use Cloudera Director, Microsoft Active Directory, and Centrify Express to deploy a secure EDH cluster for workloads in the public cloud. 

There are several best practices for deploying a secure Apache Hadoop-powered enterprise data hub (EDH) cluster on Amazon Web Services (AWS), including use of Centrify Express for Linux-to-Active Directory host integration and Microsoft Active Directory as the core integration point for identity, authentication, authorization, and public key infrastructure (PKI).

Read More

New in Cloudera Manager 5.7: Cluster Utilization Reporting

Categories: Cloudera Manager Impala Ops and DevOps Performance YARN

Cluster admins will love the new cluster utilization reporting available in Cloudera Manager 5.7.

Enterprise data hub clusters often are shared by several teams. In such multi-tenant environments, cluster administrators are required to ensure that resources are shared fairly so that one tenant cannot run jobs that starve others. To give better visibility into resource consumption in multi-tenant environments, Cloudera Manager 5.7 (in Cloudera Enterprise Flex and Data Hub Editions) has a new feature for reporting cluster utilization that provides information about overall cluster usage,

Read More