Apache Kudu Archives | Page 2 of 3

September 12, 2019 | Technical

CDH 6.3 Release: What’s new in Kudu

Cloudera recently launched CDH 6.3 which includes two new key features from Apache Kudu: Fine-grained authorization with Apache Sentry integration Backup & restore of Kudu data Fine-grained authorization with Sentry integration Kudu is typically deployed as part of an Operations Data Warehouse (DWH) solution (also commonly referred to as an Active DWH and Live DWH). […]

by Cloudera 3 min read

Apache Kudu Operational DB Data Science

April 22, 2019 | Technical

Fine-Grained Authorization with Apache Kudu and Impala

Apache Impala supports fine-grained authorization via Apache Sentry on all of the tables it manages including Apache Kudu tables. Given Impala is a very common way to access the data stored in Kudu, this capability allows users deploying Impala and Kudu to fully secure the Kudu data in multi-tenant clusters even though Kudu does not […]

by Cloudera 4 min read

Apache Impala Apache Kudu Apache Sentry Apache Spark

March 19, 2019 | Technical

Testing Apache Kudu Applications on the JVM

Although the Kudu server is written in C++ for performance and efficiency, developers can write client applications in C++, Java, or Python. To make it easier for Java developers to create reliable client applications, we’ve added new utilities in Kudu 1.9.0 that allow you to write tests using a Kudu cluster without needing to build […]

by Cloudera 4 min read

Apache Kudu

March 4, 2019 | Technical

Transparent Hierarchical Storage Management with Apache Kudu and Impala

When picking a storage option for an application it is common to pick a single storage option which has the most applicable features to your use case. For mutability and real-time analytics workloads you may want to use Apache Kudu, but for massive scalability at a low cost you may want to use HDFS. For that […]

by Cloudera 9 min read

Apache Impala Apache Kudu Apache Parquet Cloudera Enterprise

September 27, 2018 | Business

Next Generation Data Warehousing at Santander UK

Timely data is crucial to businesses in the Big Data age: This blog post outlines how Santander UK utilises the latest Cloudera technologies and superior software development capability to create the next generation of data warehousing and streaming analytics to support intelligence that can improve relationships with customers and follow the mantra of ‘we want […]

by Cloudera 4 min read

Apache HBase Apache Kafka Apache Kudu Cloudera Enterprise

May 30, 2017 | Technical

Bi-temporal data modeling with Envelope

One of the most fundamental aspects a data model can convey is how something changes over time. This makes sense when considering that we build data models to capture what is happening in the real world, and the real world is constantly changing. The challenge is that it’s not just that new things are occurring, […]

by Cloudera 8 min read

Apache Impala Apache Kudu Apache Spark Cloudera Enterprise Data Ingestion

April 13, 2017 | Technical

Apache Kudu Read & Write Paths

Analytical and operational access patterns are very different and until now the Hadoop ecosystem has not had a single storage engine that could support both. As a result, engineers have been forced to implement complex architectures that stitch multiple systems together in order to provide these capabilities. On one hand immutable data on HDFS offers […]

by Cloudera , David Alves 7 min read

Apache Kudu Cloudera Enterprise

May 26, 2016 | Technical

New in Cloudera Labs: Envelope (for Apache Spark Streaming)

As a warm-up to Spark Summit West in San Francisco (June 6-8), we’ve added a new project to Cloudera Labs that makes building Spark Streaming pipelines considerably easier. Spark Streaming is the go-to engine for stream processing in the Cloudera stack. It allows developers to build stream data pipelines that harness the rich Spark API for […]

by Cloudera 3 min read

Apache Kafka Apache Kudu AI

February 18, 2016 | Technical

Introducing Apache Arrow: A Fast, Interoperable In-Memory Columnar Data Structure Standard

Engineers from across the Apache Hadoop community are collaborating to establish Arrow as a de-facto standard for columnar in-memory processing and interchange. Here’s how it works. Apache Arrow is an in-memory data structure specification for use by engineers building data systems. It has several key benefits: A columnar memory-layout permitting O(1) random access. The layout is highly cache-efficient in […]

by Cloudera , Todd Lipcon , Wes McKinney 4 min read

Apache HDFS Apache Impala Apache Kudu Data Science Performance

November 10, 2015 | Technical

How-to: Ingest and Query “Fast Data” with Impala (Without Kudu)

Impala is designed to deliver insight on data in Apache Hadoop in real time. As data often lands in Hadoop continuously in certain use cases (such as time-series analysis, real-time fraud detection, real-time risk detection, and so on), it’s desirable for Impala to query this new “fast” data with minimal delay and without interrupting running […]

by Cloudera 9 min read

Apache Hadoop Apache Impala Apache Kudu

Filter By