Tag Archives: apache hadoop

Sustained Innovation in Apache Spark: DataFrames, Spark SQL, and MLlib

Categories: CDH Spark

Cloudera has announced support for Spark SQL/DataFrame API and MLlib. This post explains their benefits for app developers, data analysts, data engineers, and data scientists.

In July 2015, Cloudera re-affirmed its position since 2013: that Apache Spark is on course to replace MapReduce as the default general-purpose data processing engine for Apache Hadoop. Thanks to initiatives like the One Platform Initiative,

Read more

Cloudera Enterprise 5.5 is Now Generally Available

Categories: CDH Cloudera Manager

Cloudera Enterprise 5.5 (comprising CDH 5.5, Cloudera Manager 5.5, and Cloudera Navigator 2.4) has been released.

Cloudera is excited to bring you news of Cloudera Enterprise 5.5. Our persistent emphasis on quality is especially pronounced in this release, with more than 500 issues identified and triaged during its development.

A highlight of this release is the inclusion of Cloudera Navigator Optimizer (available in limited beta for select Cloudera Enterprise customers;

Read more

Impala’s Next Step: Proposal to Join the Apache Software Foundation

Categories: Impala Kudu

The Impala project has already passed several important milestones on the way to its status as the leader and open standard for BI and SQL analytics on modern big data architecture. Today’s milestone is the submission of proposals for Impala and Kudu to join the Apache Software Foundation (ASF) Incubator.

[Update: Read the text of the Impala and Kudu proposals here and here, respectively.]

Since its initial release nearly five years ago,

Read more

How-to: Ingest and Query “Fast Data” with Impala (Without Kudu)

Categories: Hadoop How-to Impala Kudu

Impala is designed to deliver insight on data in Apache Hadoop in real time. As data often lands in Hadoop continuously in certain use cases (such as time-series analysis, real-time fraud detection, real-time risk detection, and so on), it’s desirable for Impala to query this new “fast” data with minimal delay and without interrupting running queries.

In this blog post, you will learn an approach for continuous loading of data into Impala via HDFS,

Read more