Tag Archives: beta

Apache Spark 2.0 Beta Now Available for CDH

Categories: Hadoop Spark

Today, Cloudera announced the availability of an Apache Spark 2.0 Beta release for users of the Cloudera platform.

Apache Spark 2.0 is tremendously exciting (read this post for more background) because (among other things):

  • The Dataset API further enhances Spark’s claim as the best tool for data engineering by providing compile-time type safety along with the benefits of a query-optimization engine.
  • The Structured Streaming API enables the modeling of streaming data as a continuous DataFrame and expresses operations on that data with a SQL-like API.

Read More

Announcing RecordService Beta 2: Brings Column-level Security to Apache Spark and MapReduce

Categories: General Platform Security & Cybersecurity Sentry Spark

With this new beta release, column-level privileges set via Apache Sentry (incubating) are now enforced on Spark/MapReduce jobs.

Cloudera is excited to announce the availability of the second beta release for RecordService. This release is based on CDH 5.5 and provides some new features, including:

  • Support for Sentry column-level security. Previously, column-level access control required the use of views; now,

Read More

Progress Report: Hive-on-Spark Nears Production Readiness

Categories: Cloudera Labs Hive Spark

Contributors from Intel, Cloudera, and the rest of the community have been making strong progress on the Hive-on-Spark initiative. This post provides an update.

[Editor’s note (April 20, 2016): Hive-on-Spark is now GA/shipping starting in CDH 5.7.]

Since its inception about one year ago, the community initiative to make Apache Spark a data processing engine for Apache Hive (HIVE-7292) has attracted widespread interest from developers around the world and gone through phases of rapid development,

Read More

Docker is the New QuickStart Option for Apache Hadoop and Cloudera

Categories: CDH Ops and DevOps QuickStart VM Testing

Now there’s an even quicker “QuickStart” option for getting hands-on with the Apache Hadoop ecosystem and Cloudera’s platform: a new Docker image.

docker-logoYou might already be familiar with Cloudera’s popular QuickStart VM, a virtual image containing our distributed data processing platform. Originally intended as a demo environment, the QuickStart VM quickly evolved over time into quite a useful general-purpose environment for developers, customers,

Read More

Introducing Cloudera Navigator Optimizer: For Optimal SQL Workload Efficiency on Apache Hadoop

Categories: Cloudera Navigator Impala Performance

Cloudera Navigator Optimizer, a new (beta) component of Cloudera Enterprise, helps optimize inefficient query workloads for best results on Apache Hadoop.

With the proliferation of Apache Hadoop deployments, more and more customers are looking to reduce operational overheads in their enterprise data warehouse (EDW) installations by exploiting low-cost, highly scalable, open source SQL-on-Hadoop frameworks such as Impala and Apache Hive. Processing portions of SQL workloads better suited to Hadoop on these frameworks,

Read More