Category Archives: General

Up and running with Apache Spark on Apache Kudu

Categories: CDH Data Ingestion Data Science General Hadoop How-to Impala Kudu Spark Training Use Case

After the GA of Apache Kudu in Cloudera CDH 5.10, we take a look at the Apache Spark on Kudu integration, share code snippets, and explain how to get up and running quickly, as Kudu is already a first-class citizen in Spark’s ecosystem.

 

As the Apache Kudu development team celebrates the initial 1.0 release launched on September 19, and the most recent 1.2.0 version now GA as part of Cloudera’s CDH 5.10 release,

Read More

Achieving a 300% speedup in ETL with Apache Spark

Categories: Data Ingestion General Hadoop HDFS Spark

A common design pattern often emerges when teams begin to stitch together existing systems and an EDH cluster: file dumps, typically in a format like CSV, are regularly uploaded to EDH, where they are then unpacked, transformed into optimal query format, and tucked away in HDFS where various EDH components can use them. When these file dumps are large or happen very often, these simple steps can significantly slow down an ingest pipeline. Part of this delay is inevitable;

Read More

Cloudera’s Process for Handling Security Vulnerabilities

Categories: General Platform Security & Cybersecurity

Cloudera considers the handling and reporting of security vulnerabilities a very serious matter. In this post, learn the processes involved.

In addition to expecting enterprise-class standards for stability and reliability, Cloudera’s customers also have expectations for industry-standard processes around the discovery, fix, and reporting of security issues. In this post, I will describe how Cloudera addresses such issues in our software.

An overview of the process looks like this flowchart:

secalert-f1

The first step in the life cycle of a security vulnerability is that it is discovered and reported to Cloudera.

Read More

Check Out Those New and Improved Cloudera Docs

Categories: CDH Cloudera Manager General

Cloudera has given its documentation set a facelift, and we think you’ll like the new look. We use more whitespace and a font that is easier to read and skim, and your pages load much faster. But the improvements go beyond the merely aesthetic.

While electronic documentation has been around for decades, most online documentation is still presented as if it were printed in books. There is a table of contents that assumes you will read the content from start to finish.

Read More