Tuning Hive on Tez queries can never be done in a one-size-fits-all approach. The performance on queries depends on the size of the data, file types, query design, and query patterns. During performance testing, evaluate and validate configuration parameters and any SQL modifications. It is advisable to make one change at a time during performance testing of the workload, and would be best to assess the impact of tuning changes in your development and QA environments before using them in production environments. Cloudera WXM can assist in evaluating the benefits of query changes during performance testing.
Tuning Guidelines
It has been observed across several migrations from CDH distributions to CDP Private Cloud that Hive on Tez queries tend to perform slower compared to older execution engines like MR or Spark. This is usually caused by differences in out-of-the-box tuning behavior between the different execution engines. Additionally, users may have completed tuning in the legacy distribution that is not automatically reflected in the conversion to Hive on Tez. For users upgrading from HDP distribution, this discussion would also help to review and validate if the properties are correctly configured for performance in CDP.
The steps below help you identify the areas to focus on that might degrade performance.
Step 1: Verify and validate the YARN Capacity Scheduler configurations. A misconfigured queue configuration can impact query performance due to an arbitrary cap on available resources to the user. Validate the user-limit factor, min-user-limit percent, and maximum capacity. (Refer to the YARN – The Capacity Scheduler blog to understand these configuration settings.)
Step 2: Review the relevance of any safety valves (the non-default values for Hive and HiveServer2 configurations) for Hive and Hive on Tez. Remove any legacy and outdated properties.
Step 3: Identify the area of slowness, such as map tasks, reduce tasks, and joins.
- Review the generic Tez engine and platform tunable properties.
- Review the map tasks and tune—increase/decrease the task counts as required.
- Review the reduce tasks and tune—increase/decrease the task counts as required.
- Review any concurrency related issues—there are two kinds of concurrency issues as listed below:
- Concurrency among users within a queue. This can be tuned using the user limit factor of the YARN queue (refer the details in Capacity Scheduler blog).
- Concurrency across pre-warmed containers for Hive on Tez sessions, as discussed in detail below.
Understanding parallelization in Tez
Before changing any configurations, you must understand the mechanics of how Tez works internally. For example, this includes understanding how Tez determines the correct number of mappers and reducers. Reviewing the Tez architecture design and the details regarding how the initial tasks parallelism and auto-reduce parallelism works will help you optimize the query performance.
Understanding numbers of mappers
Tez determines the number of mapper tasks using the initial input data for the job. In Tez, the number of tasks are determined by the grouping splits, which is equivalent to the number of mappers determined by the input splits in map reduce jobs.
- tez.grouping.min-size and tez.grouping.max-size determine the number of mappers. The default values for min-size is 16 MB and max-size is 1 GB.
- Tez determines the number of tasks such that the data per task is in line with the grouping max/min size.
- Decreasing the tez.grouping.max-size increases the number of tasks/mappers.
- Increasing the tez.grouping.max-size decreases the number of tasks.
- Consider the following example:
- Input data (input shards/splits) – 1000 files (around 1.5 MB size)
- Total data size would be – 1000*1.5 MB = ~ 1.5 GB
- Tez could try processing this data with at least two tasks because max data/task could be 1 G. Eventually, Tez could force 1000 files (splits) to be combined to two tasks, leading to slower execution times.
- If the tez.grouping.max-size is reduced from 1 GB to 100 MB, the number of mappers could be increased to 15 providing better parallelism. Performance then increases because the improved parallelism increases the work spread from two concurrent tasks to 15.
The above is an example scenario, however in a production environment where one uses binary file formats like ORC or parquet, determining the number of mappers depending on storage type, split strategy file, or HDFS block boundaries could get complicated.
Note: A higher degree of parallelism (e.g. high number of mappers/reducers) doesn’t always translate to better performance, since it could lead to fewer resources per task and higher resource wastage due to task overhead.
Understanding the numbers of reducers
Tez uses a number of mechanisms and settings to determine the number of reducers required to complete a query.
- Tez determines the reducers automatically based on the data (number of bytes) to be processed.
- If hive.tez.auto.reducer.parallelism is set to true, hive estimates data size and sets parallelism estimates. Tez will sample source vertices’ output sizes and adjust the estimates at runtime as necessary.
- By default the max reducers number is set to 1009 ( hive.exec.reducers.max)
- Hive/Tez estimates the number of reducers using the following formula and then schedules the Tez DAG:
Max(1, Min(hive.exec.reducers.max [1009], ReducerStage estimate/hive.exec.reducers.bytes.per.reducer)) x hive.tez.max.partition.factor [2]
- The following three parameters can be tweaked to increase or decrease the number of mappers:
-
- hive.exec.reducers.bytes.per.reducer
Size per reducer. Change this to a smaller value to increase parallelism or change it to a larger value to decrease parallelism. Default Value = 256 MB [i.e if the input size is 1 GB then 4 reducers will be used] - tez.min.partition.factor Default Value = 0.25
- tez.max.partition.factor Default Value = 2.0
Increase for more reducers. Decrease for less number of reducers.
- hive.exec.reducers.bytes.per.reducer
- Users can manually set the number of reducers by using mapred.reduce.tasks. This is not recommended and you should avoid using this.
- Recommendations:
- Avoid setting the reducers manually.
- Adding more reducers doesn’t always guarantee better performance.
- Depending on the reduce stage estimates, tweak the hive.exec.reducers.bytes.per.reducer parameter to a lower or higher value if you want to increase or decrease the number of reducers.
Concurrency
This section aims to help in understanding and tuning concurrent sessions for Hive on Tez, such as running multiple Tez AM containers. The below properties help to understand default queues and the number of sessions behavior.
- hive.server2.tez.default.queues : A list of comma separated values corresponding to YARN queues for which to maintain a Tez session pool.
- hive.server2.tez.sessions.per.default.queue: The number of Tez sessions (DAGAppMaster) to maintain in the pool per YARN queue.
- hive.server2.tez.initialize.default.sessions: If enabled, HiveServer2 (HS2), at startup, will launch all necessary Tez sessions within the specified default.queues to meet the sessions.per.default.queue requirements.
When you define the below listed properties, HiveServer2 will create one Tez Application Master (AM) for each default queue, multiplied by the number of sessions when HiveServer2 service starts. Hence:
(Tez Sessions)total = HiveServer2instances x (default.queues) x (sessions.per.default.queue)
Understanding via Example:
- hive.server2.tez.default.queues= “queue1, queue2”
- hive.server2.tez.sessions.per.default.queue=2
=>Hiveserver2 will create 4 Tez AM (2 for queue1 and 2 for queue2).
Note: The pooled Tez sessions are always running, even on an idle cluster.
If there is continuous usage of HiveServer2, those Tez AM will keep running, but if your HS2 is idle, those Tez AM will be killed based on timeout defined by tez.session.am.dag.submit.timeout.secs.
Case 1: Queue name is not specified
- A query will only use a Tez AM from the pool (initialized as described above) if one does not specify queue name (tez.queue.name). In this case, HiveServer2 will pick one of Tez AM idle/available (queue name here may be randomly selected).
- If one does not specify a queue name, the query remains in pending state with HiveServer2 until one of the default Tez AMs from the initialized pool is available to serve the query. There won’t be any message in JDBC/ODBC client or in the HiveServer2 log file. Because no message is generated when the query is pending, the user may think the JDBC/ODBC connection or HiveServer2 is broken, but it’s waiting for a Tez AM to execute the query.
Case 2: Queue name specified
- If one does specify the queue name, it doesn’t matter how many initialized Tez AMs are in use or idle, HiveServer2 will create a new Tez AM for this connection and the query can be executed (if the queue has available resources).
Guidelines/recommendations for concurrency:
- For use cases or queries where one doesn’t want users limited to the same Tez AM pool, set this hive.server2.tez.initialize.default.sessions to false. Disabling this can reduce contention on HiveServer2 and improve query performance.
- Additionally, increase the number of sessions hive.server2.tez.sessions.per.default.queue
- If there are use cases requiring a separate or dedicated Tez AM pool for each group of users, one will need to have dedicated HiveServer2 service, each of them with a respective default queue name and number of sessions, and ask each group of users to use their respective HiveServer2.
Container reuse and prewarm containers
- Container reuse:
This is an optimization that limits the startup time impact on containers. This is turned on by setting tez.am.container.reuse.enabled to true. This saves time interacting with YARN. I also keep container groups alive, a faster spin of containers, and skip yarn queues. - Prewarm containers:
The number of containers is related to the amount of YARN execution containers that will be attached to each Tez AM by default. This same number of containers will be held by each AM, even when Tez AM is idle (not executing queries).
The downside of this would appear in cases where there are too many containers sitting idle and not released, since the containers defined here would be held by Tez AM even when it is idle. These idle containers would continue taking resources in YARN that other applications could potentially utilize.
The below properties are used to configure Prewarm Containers:- hive.prewarm.enabled
- hive.prewarm.numcontainers
General Tez tuning parameters
Review the properties listed below as a first-level check when dealing with performance degradation of Hive on Tez queries. You might need to set or tune some of these properties in accordance with your query and data properties. It would be best to assess the configuration properties in development and QA environments, and then push it to production environments depending on the results.
- hive.cbo.enable
Setting this property to true enables the cost-based optimization (CBO). CBO is part of Hive’s query processing engine. It is powered by Apache Calcite. CBO generates efficient query plans by examining tables and conditions specified in the query, eventually reducing the query execution time and improving resource utilization. - hive.auto.convert.join
Setting this property to true allows Hive to enable the optimization about converting common join into mapjoin based on the input file size. - hive.auto.convert.join.noconditionaltask.size
You will want to perform as many mapjoins as possible in the query. This size configuration enables the user to control what size table can fit in memory. This value represents the sum of the sizes of tables that can be converted to hashmaps that fit in memory.
The recommendation would be to set this as ⅓ the size of hive.tez.container.size. - tez.runtime.io.sort.mb
The size of the sort buffer when output is sorted. The recommendation would be to set this to 40% of hive.tez.container.size up to a maximum of 2 GB. It would rarely need to be above this maximum. - tez.runtime.unordered.output.buffer.size-mb
This is the memory when the output does not need to be sorted. It is the size of the buffer to use if not writing directly to disk. The recommendation would be to set this to 10% of hive.tez.container.size. - hive.exec.parallel
This property enables parallel execution of Hive query stages. By default, this is set to false. Setting this property to true helps to parallelize the independent query stages, resulting in overall improved performance. - hive.vectorized.execution.enabled
Vectorized query execution is a Hive feature that greatly reduces the CPU usage for typical query operations like scans, filters, aggregates, and joins. By default this is set to false. Set this to true.
- hive.merge.tezfiles
By default, this property is set to false. Setting this property to true would merge the Tez files. Using this property could increase or decrease the execution time of the query depending on size of the data or number of files to merge. Assess your query performance in lower environments before using this property. - hive.merge.size.per.task
This property describes the size of the merged files at the end of a job. - hive.merge.smallfiles.avgsize
When the average output file size of a job is less than this number, Hive will start an additional map-reduce job to merge the output files into bigger files. By default, this property is set at 16 MB.
Summary
This blog covered some basic troubleshooting and tuning guidelines for Hive on Tez queries with respect to CDP. As the very first step in query performance analysis, you should verify and validate all the configurations set on Hive and Hive on Tez services. Every change made should be tested to ensure that it makes a measurable and beneficial improvement. Query tuning is a specialized effort and not all queries perform better by changing the Tez configuration properties. You may encounter scenarios where you need to deep dive into the SQL query to optimize and improve the execution and performance. Contact your Cloudera Account and Professional Services team to provide guidance if you require additional assistance on performance tuning efforts.