Tag Archives: scalability

Transparent Hierarchical Storage Management with Apache Kudu and Impala

Categories: CDH Impala Kudu Parquet

When picking a storage option for an applicationĀ it is common to pick a single storage option which has the most applicable features to your use case. For mutability and real-time analytics workloads you may want to use Apache Kudu, but for massive scalability at a low cost you may want to use HDFS. For that reason, there is a need for a solution that allows you to leverage the best features of multiple storage options.

Read more

Scalability Improvement of Apache Impala 2.12.0 in CDH 5.15.0

Categories: CDH Impala

Key Takeaways

We have significantly improved Impala in CDH 5.15.0 to address some of the scalability bottlenecks in query execution. 64 concurrent streams of TPC-DS queries at 10TB scale in a 135-node cluster now run at 6x query throughput compared to previous releases. In addition to running faster, the query success rate also improved from 73% to 100%. Overall, Impala in CDH 5.15.0 provides massive improvements in throughput and reliability while reducing the resource usage significantly.

Read more