Over the past year (and through several releases), Apache Impala (incubating) has added numerous new features and performance enhancements better enabling high-performance SQL analytics over big data. Thus, it is time again for an update to the Impala cookbook, which contains best practices for these new features, updated guidelines, and more detailed examples.
Note: This cookbook does not yet capture best practices for the major new advancements available with the recent GA of Kudu. Stay tuned for some upcoming blogs for more details.
As the leader for powering modern analytic database workloads, Impala continues to be used by an increasing number of enterprises to support large-scale, multi-user BI and analytic use cases. The Impala cookbook has been one of the most popular resources to help these Impala users best tune their system. This latest update to the Impala cookbook now includes additional details from the technology advancements and learnings over the past year to help you get the most out of Impala.
Highlights of the topics that have been updated include:
- Query tuning and performance
- Runtime filters: how they work, and how to tune them
- Codegen: more operations are now supported
- Nested types
- Updated benchmarking results (compute stats, sorting, writing speed)
- New guidelines about catalog metadata, incremental stats
- Cluster sizing
- KDC capacity consideration
- Connection storm handling
We also removed some outdated topics and guidelines that are no longer needed with the latest releases of Impala.
The goal of the best practices guide is to provide practical advice based on our collective experiences with Impala to help you quickly adopt and successfully take advantage of the latest features. Happy cooking!