Category Archives: How-to

How-to: Use the New Apache Oozie Database Migration Tool

Categories: How-to Oozie

Updated 11/22/16 – Important: All features below are working on CDH 5.9.0 and CM 5.9.0 and above. 

This tool makes Oozie migrations off Apache Derby (or any other supported database) easy, in addition to streamlining upgrades.

The Apache Oozie server is a stateless web application by design, with all information about running and completed workflows, coordinator jobs, and bundle jobs stored in a relational database.

Read More

How-to: Do Scalable Graph Analytics with Apache Spark

Categories: Data Science Graph Processing How-to Spark

Get started with scalable graph analysis via simple examples that utilize GraphFrames and Spark SQL on HDFS.

Graphs—also known as “networks”—are ubiquitous across web applications. As a refresher, a graph consists of nodes and edges. A node can be any object, such as a person or an airport, and an edge is a relation between two nodes, such as a friendship or an airline connection between two cities.

Read More

How-to: Deploy a Secure Enterprise Data Hub on AWS (Part 2)

Categories: Cloud How-to Platform Security & Cybersecurity

Learn how to use Cloudera Director, Microsoft Active Directory, and Centrify Express to deploy a secure EDH cluster for workloads in the public cloud.

In Part 1 of this series, you learned about configuring Microsoft Active Directory and Centrify Express for optimal security across your Cloudera-powered EDH, whether for on-premise or public-cloud deployments. In this concluding installment, you’ll learn the cloud-specific pieces in this process, including some AWS fundamentals and in-depth details about cluster provisioning using Cloudera Director.

Read More

How-to: Analyze Fantasy Sports using Apache Spark and SQL

Categories: Hive How-to Impala Spark Use Case

As part of the drumbeat for Spark Summit West in San Francisco (June 6-8),  learn how analyzing stats from professional sports leagues is an instructive use case for data analytics using Apache Spark with SQL.

In the United States, many diehard sports fans morph into amateur statisticians to get an edge over the competition in their fantasy sports leagues. Depending on one’s technical chops, this “edge” is usually no more sophisticated than simple spreadsheet analysis,

Read More

How-to: Configure SAP HANA with Apache Impala (incubating)

Categories: How-to Impala

Combining HANA and Impala can unlock a variety of new use cases that span the full range of enterprise data. Here’s how to do it.

Information is growing at an exponential rate driven by enterprise applications and databases, and often takes the form of new types of data from sources such as social media, sensors, and mobile devices. Because it is not cost-effective to store and process all this information in an in-memory database,

Read More