Tag Archives: CDH

Sustained Innovation in Apache Spark: DataFrames, Spark SQL, and MLlib

Categories: CDH Spark

Cloudera has announced support for Spark SQL/DataFrame API and MLlib. This post explains their benefits for app developers, data analysts, data engineers, and data scientists.

In July 2015, Cloudera re-affirmed its position since 2013: that Apache Spark is on course to replace MapReduce as the default general-purpose data processing engine for Apache Hadoop. Thanks to initiatives like the One Platform Initiative,

Read more

New in Cloudera Enterprise 5.5: Support for Complex Types in Impala

Categories: Impala Parquet

The new support for complex types in Impala makes running analytic workloads considerably simpler.

Impala 2.3 (shipping starting in Cloudera Enterprise 5.5) contains support for querying complex types in Apache Parquet tables, specifically ARRAY, MAP, and STRUCTs. This capability enables users to query against naturally nested data sets without having to perform ETL to flatten them. This feature provides a few major benefits, including:

  • It removes additional ETL and data modeling work to flatten data sets.

Read more

Cloudera Enterprise 5.5 is Now Generally Available

Categories: CDH Cloudera Manager

Cloudera Enterprise 5.5 (comprising CDH 5.5, Cloudera Manager 5.5, and Cloudera Navigator 2.4) has been released.

Cloudera is excited to bring you news of Cloudera Enterprise 5.5. Our persistent emphasis on quality is especially pronounced in this release, with more than 500 issues identified and triaged during its development.

A highlight of this release is the inclusion of Cloudera Navigator Optimizer (available in limited beta for select Cloudera Enterprise customers;

Read more