Updated 11/22/16 – Important: All features below are working on CDH 5.9.0 and CM 5.9.0 and above.
This tool makes Oozie migrations off Apache Derby (or any other supported database) easy, in addition to streamlining upgrades.
The Apache Oozie server is a stateless web application by design, with all information about running and completed workflows, coordinator jobs, and bundle jobs stored in a relational database.
Get started with scalable graph analysis via simple examples that utilize GraphFrames and Spark SQL on HDFS.
Graphs—also known as “networks”—are ubiquitous across web applications. As a refresher, a graph consists of nodes and edges. A node can be any object, such as a person or an airport, and an edge is a relation between two nodes, such as a friendship or an airline connection between two cities.
Learn how to use Cloudera Director, Microsoft Active Directory, and Centrify Express to deploy a secure EDH cluster for workloads in the public cloud.
In Part 1 of this series, you learned about configuring Microsoft Active Directory and Centrify Express for optimal security across your Cloudera-powered EDH, whether for on-premise or public-cloud deployments. In this concluding installment, you’ll learn the cloud-specific pieces in this process, including some AWS fundamentals and in-depth details about cluster provisioning using Cloudera Director.
As part of the drumbeat for Spark Summit West in San Francisco (June 6-8), learn how analyzing stats from professional sports leagues is an instructive use case for data analytics using Apache Spark with SQL.
In the United States, many diehard sports fans morph into amateur statisticians to get an edge over the competition in their fantasy sports leagues. Depending on one’s technical chops, this “edge” is usually no more sophisticated than simple spreadsheet analysis,
Combining HANA and Impala can unlock a variety of new use cases that span the full range of enterprise data. Here’s how to do it.
Information is growing at an exponential rate driven by enterprise applications and databases, and often takes the form of new types of data from sources such as social media, sensors, and mobile devices. Because it is not cost-effective to store and process all this information in an in-memory database,