Category Archives: How-to

How-to: Deploy a Secure Enterprise Data Hub on AWS (Part 2)

Categories: Cloud How-to Security

Learn how to use Cloudera Director, Microsoft Active Directory, and Centrify Express to deploy a secure EDH cluster for workloads in the public cloud.

In Part 1 of this series, you learned about configuring Microsoft Active Directory and Centrify Express for optimal security across your Cloudera-powered EDH, whether for on-premise or public-cloud deployments. In this concluding installment, you’ll learn the cloud-specific pieces in this process, including some AWS fundamentals and in-depth details about cluster provisioning using Cloudera Director.

Read More

How-to: Analyze Fantasy Sports using Apache Spark and SQL

Categories: Hive How-to Impala Spark Use Case

As part of the drumbeat for Spark Summit West in San Francisco (June 6-8),  learn how analyzing stats from professional sports leagues is an instructive use case for data analytics using Apache Spark with SQL.

In the United States, many diehard sports fans morph into amateur statisticians to get an edge over the competition in their fantasy sports leagues. Depending on one’s technical chops, this “edge” is usually no more sophisticated than simple spreadsheet analysis,

Read More

How-to: Configure SAP HANA with Apache Impala (incubating)

Categories: How-to Impala

Combining HANA and Impala can unlock a variety of new use cases that span the full range of enterprise data. Here’s how to do it.

Information is growing at an exponential rate driven by enterprise applications and databases, and often takes the form of new types of data from sources such as social media, sensors, and mobile devices. Because it is not cost-effective to store and process all this information in an in-memory database,

Read More

How-to: Deploy a Secure Enterprise Data Hub on AWS

Categories: CDH Cloud How-to Ops and DevOps

Learn how to use Cloudera Director, Microsoft Active Directory, and Centrify Express to deploy a secure EDH cluster for workloads in the public cloud. 

There are several best practices for deploying a secure Apache Hadoop-powered enterprise data hub (EDH) cluster on Amazon Web Services (AWS), including use of Centrify Express for Linux-to-Active Directory host integration and Microsoft Active Directory as the core integration point for identity, authentication, authorization, and public key infrastructure (PKI).

Read More

How-to: Use Impala and Kudu Together for Analytic Workloads

Categories: Data Science Hadoop How-to Impala Kudu Performance

Using Apache Impala (incubating) on top of Apache Kudu (incubating) has significant performance benefits

Apache Kudu (incubating) is the newest addition to the set of storage engines that integrate with the Apache Hadoop ecosystem. The promise of Kudu is to deliver high-scan performance, targeting analytical workloads, while allowing users to concurrently insert, update, and delete records. With these properties, Kudu becomes a viable alternative to existing combinations of HDFS and/or Apache HBase to achieve similar results with less complicated ETL pipelines,

Read More