Tag Archives: performance

Faster Swarms of Data : Accelerating Hive Queries with Parquet Vectorization

Categories: CDH Hive Parquet Performance

Background

Apache Hive is a widely adopted data warehouse engine that runs on Apache Hadoop. Features that improve Hive performance can significantly improve the overall utilization of resources on the cluster. Hive processes data using a chain of operators within the Hive execution engine. These operators are scheduled in the various tasks (for example, MapTask, ReduceTask, or SparkTask) of the query execution plan. Traditionally, these operators are designed to process one row at a time.

Read more

Assessment of Apache Impala Performance using Cloudera Manager Metrics – Part 1 of 3

Categories: CDH Cloudera Manager Impala Performance

For a user-facing system like Apache Impala, bad performance and downtime can have serious negative impacts on your business. Given the complexity of the system and all the moving parts, troubleshooting can be time-consuming and overwhelming.

In this blog post series, we are going to show how the charts and metrics on Cloudera Manager (CM) can help troubleshoot Impala performance issues. They can also help to monitor the system to predict and prevent future outages.

Read more

Evaluating Partner Platforms

Categories: CDH Hardware How-to Performance

As a member of Cloudera’s Partner Engineering team, I evaluate hardware and cloud computing platforms offered by commercial partners who want to certify their products for use with Cloudera software. One of my primary goals is to make sure that these platforms provide a stable and well-performing base upon which our products will run, a state of operation that a wide variety of customers performing an even wider variety of tasks can appreciate.

Read more

The Value of Certification

Categories: Careers Training

Each year in early November, my inbox fills up with people asking advice about certification. Some are reflecting on their careers and looking to move on or move up; others have given themselves or their managers the goal of getting certified this year. They awake one morning in early November and realize the clock is ticking.

The first thing they ask for is a discount, of course. Beyond that, they want to know what a certification is going to do for them more generally,

Read more

A Look at ADLS Performance – Throughput and Scalability

Categories: CDH Cloud Hadoop HDFS Performance

Overview

Azure Data Lake Store (ADLS) is a highly scalable cloud-based data store that is designed  for collecting, storing and analyzing large amounts of data, and is ideal for enterprise-grade applications.  Data can originate from almost any source, such as Internet applications and mobile devices; it is stored securely and durably, while being highly available in any geographic region.  ADLS is performance-tuned for big data analytics and can be easily accessed from many components of the Apache Hadoop ecosystem,

Read more