Tag Archives: release

Impala’s Next Step: Proposal to Join the Apache Software Foundation

Categories: Impala Kudu

The Impala project has already passed several important milestones on the way to its status as the leader and open standard for BI and SQL analytics on modern big data architecture. Today’s milestone is the submission of proposals for Impala and Kudu to join the Apache Software Foundation (ASF) Incubator.

[Update: Read the text of the Impala and Kudu proposals here and here, respectively.]

Since its initial release nearly five years ago,

Read more

How-to: Use HUE’s Notebook App with SQL and Apache Spark for Analytics

Categories: How-to Hue Spark

This post from the HUE team about using HUE (the open source web GUI for Apache Hadoop), Apache Spark, and SQL for analytics was initially published in the HUE project’s blog.

Apache Spark is getting popular and HUE contributors are working on making it accessible to even more users. Specifically, by creating a Web interface that allows anyone with a browser to type some Spark code and execute it.

Read more

How-to: Build a Machine-Learning App Using Sparkling Water and Apache Spark

Categories: CDH Data Science Guest How-to Spark

Thanks to Michal Malohlava, Amy Wang, and Avni Wadhwa of H20.ai for providing the following guest post about building ML apps using Sparkling Water and Apache Spark on CDH.

The Sparkling Water project is nearing its one-year anniversary, which means Michal Malohlava, our main contributor, has been very busy for the better part of this past year. The Sparkling Water project combines H2O machine-learning algorithms with the execution power of Apache Spark.

Read more

Kudu: New Apache Hadoop Storage for Fast Analytics on Fast Data

Categories: Hadoop HBase HDFS Impala Kudu Performance Spark

This new open source complement to HDFS and Apache HBase is designed to fill gaps in Hadoop’s storage layer that have given rise to stitched-together, hybrid architectures.

The set of data storage and processing technologies that define the Apache Hadoop ecosystem are expansive and ever-improving, covering a very diverse set of customer use cases used in mission-critical enterprise applications. At Cloudera, we’re constantly pushing the boundaries of what’s possible with Hadoop—making it faster,

Read more

RecordService: For Fine-Grained Security Enforcement Across the Hadoop Ecosystem

Categories: Hadoop Impala Platform Security & Cybersecurity Sentry

This new core security layer provides a unified data access path for all Hadoop ecosystem components, while improving performance.

We’re thrilled to announce the beta availability of RecordService, a distributed, scalable, data access service for unified access control and enforcement in Apache Hadoop. RecordService is Apache Licensed open source that we intend to transition to the Apache Software Foundation. In this post, we’ll explain the motivation, system architecture,

Read more