Apache Spark is one of the most popular engines for distributed data processing on Big Data clusters. Spark jobs come in all shapes, sizes and cluster form factors. Ranging from 10’s to 1000’s of nodes and executors, seconds to hours or even days for job duration, megabytes to petabytes of data and simple data scans to complicated analytical workloads. Throw in a growing number of streaming workloads to huge body of batch and machine learning jobs —
This was originally published on the Fast Forward Labs blog
We are excited to release Learning with Limited Labeled Data, the latest report and prototype from Cloudera Fast Forward Labs.
Being able to learn with limited labeled data relaxes the stringent labeled data requirement for supervised machine learning. Our report focuses on active learning, a technique that relies on collaboration between machines and humans to label smartly.
Cloudera Altus Director helps you deploy, scale, and manage Cloudera clusters on AWS, Microsoft Azure, or Google Cloud Platform. Altus Director both enables and enforces the best practices of big data deployments and cloud infrastructure. Altus Director’s enterprise-grade features deliver a mechanism for establishing production-ready clusters in the cloud for big data workloads and applications in a simple, reliable, automated fashion. In this post, you will learn about new functionality and changes in release 6.2.
Self-service exploratory analytics is one of the most common use cases we see by our customers running on Cloudera’s Data Warehouse solution.
With the recent release of Cloudera 6.2, we continue to improve the end user query experience with Hue, focusing on easier SQL query troubleshooting and increased compatibility with Hive. Read on to learn more and try it out in one-click at demo.gethue.com.
Easier SelfService Query Troubleshooting
Although the Kudu server is written in C++ for performance and efficiency, developers can write client applications in C++, Java, or Python. To make it easier for Java developers to create reliable client applications, we’ve added new utilities in Kudu 1.9.0 that allow you to write tests using a Kudu cluster without needing to build Kudu yourself, without any knowledge of C++, and without any complicated coordination around starting and stopping Kudu clusters for each test.