Category Archives: Impala

New in Cloudera Manager 5.7: Cluster Utilization Reporting

Categories: Cloudera Manager Impala Ops and DevOps Performance YARN

Cluster admins will love the new cluster utilization reporting available in Cloudera Manager 5.7.

Enterprise data hub clusters often are shared by several teams. In such multi-tenant environments, cluster administrators are required to ensure that resources are shared fairly so that one tenant cannot run jobs that starve others. To give better visibility into resource consumption in multi-tenant environments, Cloudera Manager 5.7 (in Cloudera Enterprise Flex and Data Hub Editions) has a new feature for reporting cluster utilization that provides information about overall cluster usage,

Read More

Introducing Apache Arrow: A Fast, Interoperable In-Memory Columnar Data Structure Standard

Categories: Data Science General HDFS Impala Kudu Performance

Engineers from across the Apache Hadoop community are collaborating to establish Arrow as a de-facto standard for columnar in-memory processing and interchange. Here’s how it works.

Apache Arrow is an in-memory data structure specification for use by engineers building data systems. It has several key benefits:

  • A columnar memory-layout permitting O(1) random access. The layout is highly cache-efficient in analytics workloads and permits SIMD optimizations with modern processors.

Read More

New SQL Benchmarks: Apache Impala (incubating) Uniquely Delivers Analytic Database Performance

Categories: Hive Impala Performance Spark

New testing results show a significant difference between the analytic database performance of Impala compared to batch and procedural development engines, as well as Impala running all 99 TPC-DS-derived queries in the benchmark workload.

2015 was an exciting year for Apache Impala (incubating). Cloudera’s Impala team significantly improved Impala’s scale and stability, which enabled many customers to deploy Impala clusters with hundreds of nodes, run millions of queries,

Read More

How-to: Design An Analytic Database Schema on Apache Impala (Incubating) with Indyco

Categories: Guest How-to Impala

Our thanks to Manuel Spezzani, Indyco Technical Leader, and Edward William Gnudi, Indyco’s Chief of Customer Happiness, for the guest post below about using Indyco alongside Apache Impala.

In this post, you will learn how to automatically design a complete data warehouse solution on top of Impala using Indyco, a tool for designing, exploring, and understand your business model (recently named Cloudera Certificated Partner for the Impala platform).

Read More

Interactive Analytics on Dynamic Big Data in Python using Kudu, Impala, and Ibis

Categories: Cloudera Labs Impala Kudu

The following post was originally published in the Ibis project blog. (Ibis is a data analysis framework incubating in Cloudera Labs that brings Apache Hadoop scale to Python development.)

The new Apache Kudu (incubating) columnar storage engine together with Apache Impala (incubating) interactive SQL engine enable a new fully open source big data architecture for data that is arriving and changing very quickly. By integrating Kudu and Impala with Ibis

Read More