Category Archives: Performance

Evaluating Partner Platforms

Categories: CDH Hardware How-to Performance

As a member of Cloudera’s Partner Engineering team, I evaluate hardware and cloud computing platforms offered by commercial partners who want to certify their products for use with Cloudera software. One of my primary goals is to make sure that these platforms provide a stable and well-performing base upon which our products will run, a state of operation that a wide variety of customers performing an even wider variety of tasks can appreciate.

Read more

New in Cloudera Data Science Workbench 1.2: Usage Monitoring for Administrators

Categories: CDH Cloudera Data Science Workbench Data Science Performance

Cloudera Data Science Workbench (CDSW) provides data science teams with a self-service platform for quickly developing machine learning workloads in their preferred language, with secure access to enterprise data and simple provisioning of compute. Individuals can request schedulable resources (e.g. compute, memory, GPUs) on a shared cluster that is managed centrally.

While self-service provisioning of resources is critical to the rapid interaction cycle of data scientists, it can pose a challenge to administrators.

Read more

A Look at ADLS Performance – Throughput and Scalability

Categories: CDH Cloud Hadoop HDFS Performance

Overview

Azure Data Lake Store (ADLS) is a highly scalable cloud-based data store that is designed  for collecting, storing and analyzing large amounts of data, and is ideal for enterprise-grade applications.  Data can originate from almost any source, such as Internet applications and mobile devices; it is stored securely and durably, while being highly available in any geographic region.  ADLS is performance-tuned for big data analytics and can be easily accessed from many components of the Apache Hadoop ecosystem,

Read more

Apache Impala Leads Traditional Analytic Database

Categories: CDH Impala Performance

Unmodified TPC-DS-based performance benchmark show Impala’s leadership compared to a traditional analytic database (Greenplum), especially for multi-user concurrent workloads. Additionally, benchmark continues to demonstrate significant performance gap between analytic databases and SQL-on-Hadoop engines like Hive LLAP, Spark SQL, and Presto.

The past year has been one of the biggest for Apache Impala (incubating). Not only has the team continued to work on ever-growing scale and stability, but a number of key capabilities have been rolled out that further solidifies Impala as the open standard for high-performance BI and SQL analytics.

Read more

Offheap Read-Path in Production – The Alibaba story

Categories: Hadoop HBase Performance Use Case

This article is syndicated with permission from the Apache HBase blog and highlights a collaboration between our partners at Intel and Alibaba engineering in time for “Singles Day“, the biggest shopping day on the net. For more on HBase, mark your calendars! On June 12th, 2017 the Apache HBase community will be hosting their annual HBaseCon.

Introduction

HBase is the core storage system in Alibaba’s Search Infrastructure.

Read more