Category Archives: Kudu

Testing Apache Kudu Applications on the JVM

Categories: Kudu Testing

Although the Kudu server is written in C++ for performance and efficiency, developers can write client applications in C++, Java, or Python. To make it easier for Java developers to create reliable client applications, we’ve added new utilities in Kudu 1.9.0 that allow you to write tests using a Kudu cluster without needing to build Kudu yourself, without any knowledge of C++, and without any complicated coordination around starting and stopping Kudu clusters for each test.

Read more

Transparent Hierarchical Storage Management with Apache Kudu and Impala

Categories: CDH Impala Kudu Parquet

When picking a storage option for an application it is common to pick a single storage option which has the most applicable features to your use case. For mutability and real-time analytics workloads you may want to use Apache Kudu, but for massive scalability at a low cost you may want to use HDFS. For that reason, there is a need for a solution that allows you to leverage the best features of multiple storage options.

Read more

Cloudera Enterprise 6.1.0 is Now Available

Categories: Accumulo CDH Cloudera Manager Cloudera Navigator Kafka Kudu Search Tools

We are pleased to announce the general availability of Cloudera Enterprise 6.1.0, the modern platform for machine learning and analytics optimized for the cloud. This release delivers several new capabilities, improved usability, and better performance.

As usual, the release includes a number of quality enhancements, bug fixes, and other improvements across the stack. Here is a partial list of what’s included (see the Release Notes for a full list):

Data Engineering

Cloudera Enterprise 6.1 now supports Spark Structured Streaming and enables micro-batch processing at ~100ms increments enabling ingest to query latencies in the Cloudera platform measured in seconds.

Read more

Next Generation Data Warehousing at Santander UK

Categories: CDH HBase HDFS Kafka Kudu Use Case

Timely data is crucial to businesses in the Big Data age: This blog post outlines how Santander UK utilises the latest Cloudera technologies and superior software development capability to create the next generation of data warehousing and streaming analytics to support intelligence that can improve relationships with customers and follow the mantra of ‘we want to help people grow and prosper.

Santander UK’s big data journey started around four years ago.

Read more

implyr: R Interface for Apache Impala

Categories: CDH Data Science HBase HDFS Impala Kudu Tools

New R package implyr enables R users to query Impala using dplyr.

Apache Impala (incubating) enables low-latency interactive SQL queries on data stored in HDFS, Amazon S3, Apache Kudu, and Apache HBase. With the availability of the R package implyr on CRAN and GitHub, it’s now possible to query Impala from R using the popular package dplyr.

dplyr provides a grammar of data manipulation,

Read more