Demystifying Spark Jobs to Optimize for Cost and Performance

Categories: Performance Spark

Apache Spark is one of the most popular engines for distributed data processing on Big Data clusters. Spark jobs come in all shapes, sizes and cluster form factors. Ranging from 10’s to 1000’s of nodes and executors, seconds to hours or even days for job duration, megabytes to petabytes of data and simple data scans to complicated analytical workloads. Throw in a growing number of streaming workloads to huge body of batch and machine learning jobs —

Read more

A Guide to Learning with Limited Labeled Data

Categories: AI and Machine Learning Fast Forward Labs

This was originally published on the Fast Forward Labs blog

We are excited to release Learning with Limited Labeled Data, the latest report and prototype from Cloudera Fast Forward Labs.

Being able to learn with limited labeled data relaxes the stringent labeled data requirement for supervised machine learning. Our report focuses on active learning, a technique that relies on collaboration between machines and humans to label smartly.

Read more

What’s New in Cloudera Altus Director 6.2?

Categories: CDH Cloud Cloudera Director

Cloudera Altus Director helps you deploy, scale, and manage Cloudera clusters on AWS, Microsoft Azure, or Google Cloud Platform. Altus Director both enables and enforces the best practices of big data deployments and cloud infrastructure. Altus Director’s enterprise-grade features deliver a mechanism for establishing production-ready clusters in the cloud for big data workloads and applications in a simple, reliable, automated fashion. In this post, you will learn about new functionality and changes in release 6.2.

Read more

What’s new in the Hue Data Warehouse Editor in Cloudera 6.2

Categories: Analytic Database Hue

Self-service exploratory analytics is one of the most common use cases we see by our customers running on Cloudera’s Data Warehouse solution.

With the recent release of Cloudera 6.2, we continue to improve the end user query experience with Hue, focusing on easier SQL query troubleshooting and increased compatibility with Hive. Read on to learn more and try it out in one-click at demo.gethue.com.

Easier SelfService Query Troubleshooting

Hue has great assistance for finding tables in the Data Catalog and getting recommendations on how to write (better) queries with the smart autocomplete,

Read more

Testing Apache Kudu Applications on the JVM

Categories: Kudu Testing

Although the Kudu server is written in C++ for performance and efficiency, developers can write client applications in C++, Java, or Python. To make it easier for Java developers to create reliable client applications, we’ve added new utilities in Kudu 1.9.0 that allow you to write tests using a Kudu cluster without needing to build Kudu yourself, without any knowledge of C++, and without any complicated coordination around starting and stopping Kudu clusters for each test.

Read more