Category Archives: Ops and DevOps

Analytics and BI on Amazon S3 with Apache Impala (Incubating)

Categories: Cloud Impala Ops and DevOps Performance

Thanks to new optimizations for running Impala on Amazon S3, doubling cluster size on AWS doubles multi-user performance while keeping total workload cost roughly the same.

With public-cloud deployments becoming increasingly popular, Cloudera is continuing to build out the capabilities of its platform to best take advantage of the cost-effective and flexible nature of the cloud. The current release of Cloudera’s platform (5.8) includes a major step forward in that area with Impala 2.6 able to store and query data directly from the Amazon S3 object store.

Read More

New in Cloudera Manager 5.7: Cluster Templates

Categories: Cloudera Manager Ops and DevOps

The new cluster templates feature in Cloudera Manager 5.7 makes creating clusters faster and easier.

Often, after an Apache Hadoop cluster has been configured correctly, its admin will want to replicate the configuration in one or more clusters—whether for promoting a dev or staging cluster to production, or setting up a new production cluster with the same configuration as an existing one.

For Cloudera customers, until recently the process for replicating cluster configurations was manual and error-prone.

Read More

How-to: Deploy a Secure Enterprise Data Hub on AWS

Categories: CDH Cloud How-to Ops and DevOps

Learn how to use Cloudera Director, Microsoft Active Directory, and Centrify Express to deploy a secure EDH cluster for workloads in the public cloud. 

There are several best practices for deploying a secure Apache Hadoop-powered enterprise data hub (EDH) cluster on Amazon Web Services (AWS), including use of Centrify Express for Linux-to-Active Directory host integration and Microsoft Active Directory as the core integration point for identity, authentication, authorization, and public key infrastructure (PKI).

Read More

New in Cloudera Manager 5.7: Cluster Utilization Reporting

Categories: Cloudera Manager Impala Ops and DevOps Performance YARN

Cluster admins will love the new cluster utilization reporting available in Cloudera Manager 5.7.

Enterprise data hub clusters often are shared by several teams. In such multi-tenant environments, cluster administrators are required to ensure that resources are shared fairly so that one tenant cannot run jobs that starve others. To give better visibility into resource consumption in multi-tenant environments, Cloudera Manager 5.7 (in Cloudera Enterprise Flex and Data Hub Editions) has a new feature for reporting cluster utilization that provides information about overall cluster usage,

Read More

How-to: Integrate Cloudera Director with a Data Pipeline in the Cloud

Categories: Cloud Ops and DevOps

Learn how to use Cloudera Director to automate cluster operations (and more) in the cloud.

Cloudera Director was designed from the beginning to be primarily an API that can integrate with your existing data pipelines and workflows to handle tasks like creating, terminating, and resizing the Apache Hadoop (CDH) clusters used to run your data processing jobs or SQL queries.

Among many other new features,

Read More