This new release includes, among other things, support for “slicing and dicing” workloads by user/application/report, workload breakdown by similar queries, and alerts for Apache Hive and Apache Impala (incubating) best practices.
Cloudera Navigator Optimizer enables database architects and database administrators (DBAs) to gain in-depth understanding of their SQL workloads running in data warehouse environments or on Apache Hadoop. Navigator Optimizer makes planning offload projects more predictable by assessing risk and reducing development costs. It also provides optimization recommendations to enable successful offload and active SQL workload management on Hadoop.
Since its limited beta release last November, we have seen customer interest across a wide range of use cases. We have helped customers offload ETL processes, BI reports, ad-hoc workloads to Cloudera’s Hadoop platform. For workloads already running on our platform, we have helped customers understand what their workloads are doing, identify potential risks, and proactively optimize their data model to support various data needs.
Today, we are excited to announce the general availability of Cloudera Navigator Optimizer. Let’s walk through two key use cases to help you speed up your offload process and actively manage Hadoop SQL workloads, as well as look at what’s next for the tool.
How To: Efficient Offloading with Cloudera Navigator Optimizer
Many enterprises spend significant resources and budget on offload projects. Workloads from enterprise data warehouse systems or traditional analytic databases can generate millions of queries each day. Critical query complexities can be hidden in thousands of lines of SQL code, making them hard to discover manually. Profiling the entire workload by hand is not feasible—for example, finding out how many times a subquery is used in tens of thousands of queries may take months.
Cloudera Navigator Optimizer can drastically reduce the development time and offload cost by exposing key workload insights instantly. It can identify duplicate and similar queries, thereby reducing the number of queries that need analysis. Join patterns, query blocks, filters, and group-by clauses are presented graphically so they are easy to understand at a glance. Dominant access patterns are profiled across the entire workload. Cloudera Navigator Optimizer categorizes workload syntax compatibility with Hive and Impala. Syntax differences that will cause problems on the target platform are highlighted in the query text, significantly reducing the development time for syntax correction.
High-level Risk Assessment for Planning
Successful offload plans never focus on individual queries. Instead, reports, applications, users, or ETL workflows are considered the basic units to offload. To make offload projects predictable, it is important to know ahead of time how many of these query groups can be easily offloaded, and what changes are required for more complex ones.
When you upload your query logs to Cloudera Navigator Optimizer, you can specify attributes that enable you to group the queries based on your business. For example, if you want to evaluate the offload potential by user, add a user column in the query file that maps each query to a user. The tool then evaluates the query complexity and compatibility with a specific Hadoop platform (Hive or Impala), to provide risk assessments for each query group. We call this feature Workload Slice & Dice.
Workload Slice & Dice immediately reveals whether there are query groups that are low risk, meaning they can be migrated without much development work. Low risk groups are ideal candidates for initial proofs of concepts, or to use as a starting point for an offload project to gain experience. For medium and high risk “slices” that require more development work, you can do a deeper analysis to find out core causes of the risks so you can plan sufficient time and resources.
Deeper Analysis of Risks
Customers can perform more detailed analysis of a group of queries to understand all the risks involved and what it takes to offload these queries. Optimization recommendations can guide customers to make data model design choices that are best suited for Hadoop.
Cloudera Navigator Optimizer lists queries as high risk when they contain operations or functions not supported by the target platform, or complexities that might cause them to fail. Medium risk alerts indicate that some changes could improve performance but are not absolutely needed to run the queries on Hadoop. For each type of risk, Cloudera Navigator Optimizer suggests how to fix them. Customers can evaluate optimization recommendations such as partitioning key strategies, denormalization, inline view materialization, or aggregate tables based on potential benefits.
SQL Workload Management with Cloudera Navigator Optimizer
Instant Visibility into Hive or Impala Workloads
For workloads already running SQL queries on Hadoop, Cloudera Navigator Optimizer quickly provides critical information about what the queries are doing on Hive or Impala clusters. This enables DBAs to answer questions such as what kind of queries are running, how complex they are, who are the top users, and how are tables and columns being used. DBAs can also find problematic queries that take a long time to run, evaluate potential latency causes, and get optimization recommendations on how to fix the problem. For example, queries lacking filters is a common issue that can be addressed with better coding practices or optimization. Deep analysis of risky queries helps DBAs to discover and evaluate recommendations to optimize the data model and improve the efficiency of the queries on Hadoop.
Cloudera Navigator Optimizer can help identify appropriate workloads for appropriate tools within Hadoop. For example, if you have an analytic workload running on Hive and you want to migrate that to Impala (see the blog post, “Choosing the Right Tool for the Job“), Cloudera Navigator Optimizer can analyze syntax compatibility and platform suitability to speed up the process.
In an environment where data needs are constantly changing and SQL queries are not always written according to platform best practices, using Cloudera Navigator Optimizer is critical to understanding usage and proactively optimizing the data model thus avoiding slow workloads.
This GA release is a huge step forward in helping our customers offload SQL workloads efficiently and actively manage them within Hadoop for peak performance. However, it is just scratching the surface of what’s now possible. Here’s a peek at what’s to come.
Comprehensive Optimization Recommendations
We will continue to provide more optimization opportunities and recommendations, such as denormalization strategies, and update consolidation. In addition to identifying optimization opportunities, we will further help customers evaluate these recommendations by integrating development cost vs. benefit assessments to create overall optimization strategies.
Integrated Experience Across Audiences
With deep knowledge about queries, Cloudera Navigator Optimizer can provide richer experiences for different user groups by continuing to integrate with other tools within Cloudera’s platform. For example, integrating metadata analysis from Cloudera Navigator can help DBAs actively manage Hive and Impala workloads more effectively. For SQL developers, the analysis from Cloudera Navigator Optimizer can provide insights on metadata and usage to aid within development interfaces such as Hue, and help prevent risky queries from getting into production.
Cloudera Navigator Optimizer is now available for customers with the appropriate user license at optimizer.cloudera.com. For more details on this new tool, and to learn more about powering an analytic database with Cloudera Enterprise, register for the webinar series.
Ewa Ding is a director of product management at Cloudera.