The Impala Cookbook

Categories: Impala

Bookmark this new living document to ensure use of current and proper configuration, sizing, management, and measurement practices.

Impala, the open source MPP analytic database for Apache Hadoop, is now firmly entrenched in the Big Data mainstream. How do we know this? For one, Impala is now the standard against which alternatives measure themselves, based on a proliferation of new benchmark testing. Furthermore, Impala has been adopted by multiple vendors as their solution for letting customers do exploratory analysis on Big Data, natively and in place (without the need for redundant architecture or ETL). Also significant, we’re seeing the emergence of best practices and patterns out of customer experiences.

As an effort to streamline deployments and shorten the path to success, Cloudera’s Impala team has compiled a “cookbook” based on those experiences, covering:

  • Physical and Schema Design
  • Memory Usage
  • Cluster Sizing and Hardware Recommendations
  • Benchmarking
  • Multi-tenancy Best Practices
  • Query Tuning Basics
  • Interaction with Apache Hive, Apache Sentry, and Apache Parquet

By using these recommendations, Impala users will be assured of proper configuration, sizing, management, and measurement practices to provide an optimal experience. Happy cooking!

PDF Version

facebooktwittergoogle_pluslinkedinmailfacebooktwittergoogle_pluslinkedinmail

6 responses on “The Impala Cookbook

  1. Suriawan

    Good cookbook and very useful. Thanks for sharing this. Is the max number of partitions 100K for one table or for the whole DB?