Category Archives: Kudu

Apache Kudu Read & Write Paths

Categories: CDH Kudu

Analytical and operational access patterns are very different and until now the Hadoop ecosystem has not had a single storage engine that could support both. As a result, engineers have been forced to implement complex architectures that stitch multiple systems together in order to provide these capabilities. On one hand immutable data on HDFS offers superior analytic performance, while mutable data in Apache HBase is best for operational workloads. Apache Kudu bridges this gap.  

Kudu’s architecture is shaped towards the ability to provide very good analytical performance,

Read More

Performance comparison of different file formats and storage engines in the Apache Hadoop ecosystem

Categories: Avro Guest Hadoop HBase Kudu Parquet

Zbigniew Baranowski is a database systems specialist and a member of a group which provides and supports central database and Hadoop-based services at CERN. This blog was originally released on CERN’s “Databases at CERN” blog, and is syndicated here with CERN’s permission.

 

TOPIC

This post presents a performance comparison of few popular data formats and storage engines available in the Apache Hadoop ecosystem: Apache Avro,

Read More

Up and running with Apache Spark on Apache Kudu

Categories: CDH Data Ingestion Data Science General Hadoop How-to Impala Kudu Spark Training Use Case

After the GA of Apache Kudu in Cloudera CDH 5.10, we take a look at the Apache Spark on Kudu integration, share code snippets, and explain how to get up and running quickly, as Kudu is already a first-class citizen in Spark’s ecosystem.

 

As the Apache Kudu development team celebrates the initial 1.0 release launched on September 19, and the most recent 1.2.0 version now GA as part of Cloudera’s CDH 5.10 release,

Read More

Cloudera Enterprise 5.10 is Now Available

Categories: CDH Cloud Cloudera Manager Cloudera Navigator Hadoop Hue Kudu

Cloudera is proud to announce that Cloudera Enterprise 5.10 is now generally available (GA). The highlights of this release include the GA of the new columnar storage engine Apache Kudu, improved cloud performance and cost-optimizations, and cloud-native data governance for Amazon S3.

As usual, there are also a number of quality enhancements and bug fixes (learn more about our multi-dimensional hardening/QA process) and other improvements across the stack. Here is a partial list of what’s included (see the Release Notes for a full list):

  • GA of Apache Kudu

Read More

Apache Kudu and Apache Impala (Incubating): The Integration Roadmap

Categories: Impala Kudu

Impala users can expect new performance and usability benefits via improved integration with Kudu.

It’s been nearly one year since the public beta announcement of Kudu (now a top-level Apache project) and a noteworthy milestone has been reached: its 1.0 release. This is particularly exciting as Kudu extends the use cases that can be supported on the Apache Hadoop platform, whether it be on-premises or in the cloud,

Read More