Category Archives: Kudu

Fine-Grained Authorization with Apache Kudu and Impala

Categories: Impala Kudu Sentry Spark

Apache Impala supports fine-grained authorization via Apache Sentry on all of the tables it manages including Apache Kudu tables. Given Impala is a very common way to access the data stored in Kudu, this capability allows users deploying Impala and Kudu to fully secure the Kudu data in multi-tenant clusters even though Kudu does not yet have native fine-grained authorization of its own. This solution works because Kudu natively supports coarse-grained (all or nothing) authorization which enables blocking all access to Kudu directly except for the impala user and an optional whitelist of other trusted users.

Read more

Testing Apache Kudu Applications on the JVM

Categories: Kudu Testing

Although the Kudu server is written in C++ for performance and efficiency, developers can write client applications in C++, Java, or Python. To make it easier for Java developers to create reliable client applications, we’ve added new utilities in Kudu 1.9.0 that allow you to write tests using a Kudu cluster without needing to build Kudu yourself, without any knowledge of C++, and without any complicated coordination around starting and stopping Kudu clusters for each test.

Read more

Transparent Hierarchical Storage Management with Apache Kudu and Impala

Categories: CDH Impala Kudu Parquet

When picking a storage option for an application it is common to pick a single storage option which has the most applicable features to your use case. For mutability and real-time analytics workloads you may want to use Apache Kudu, but for massive scalability at a low cost you may want to use HDFS. For that reason, there is a need for a solution that allows you to leverage the best features of multiple storage options.

Read more

Cloudera Enterprise 6.1.0 is Now Available

Categories: Accumulo CDH Cloudera Manager Cloudera Navigator Kafka Kudu Search Tools

We are pleased to announce the general availability of Cloudera Enterprise 6.1.0, the modern platform for machine learning and analytics optimized for the cloud. This release delivers several new capabilities, improved usability, and better performance.

As usual, the release includes a number of quality enhancements, bug fixes, and other improvements across the stack. Here is a partial list of what’s included (see the Release Notes for a full list):

Data Engineering

Cloudera Enterprise 6.1 now supports Spark Structured Streaming and enables micro-batch processing at ~100ms increments enabling ingest to query latencies in the Cloudera platform measured in seconds.

Read more

Next Generation Data Warehousing at Santander UK

Categories: CDH HBase HDFS Kafka Kudu Use Case

Timely data is crucial to businesses in the Big Data age: This blog post outlines how Santander UK utilises the latest Cloudera technologies and superior software development capability to create the next generation of data warehousing and streaming analytics to support intelligence that can improve relationships with customers and follow the mantra of ‘we want to help people grow and prosper.

Santander UK’s big data journey started around four years ago.

Read more