Author Archives: Govind Kamat

A Look at ADLS Performance – Throughput and Scalability

Categories: CDH Cloud Hadoop HDFS Performance

Overview

Azure Data Lake Store (ADLS) is a highly scalable cloud-based data store that is designed  for collecting, storing and analyzing large amounts of data, and is ideal for enterprise-grade applications.  Data can originate from almost any source, such as Internet applications and mobile devices; it is stored securely and durably, while being highly available in any geographic region.  ADLS is performance-tuned for big data analytics and can be easily accessed from many components of the Apache Hadoop ecosystem,

Read more

YCSB 0.10.0 Now in Cloudera Labs

Categories: Cloudera Labs Performance

Since the last blog post announcing the release of YCSB 0.6.0 in Cloudera Labs, users of Cloudera CDH and EDH will have noticed regular updates to the Labs version, keeping it in lockstep with the upstream release.  This should help assure users of a consistent and easy mechanism to deploy the current version of YCSB (which at the moment is v.0.10.0 in CLABS) to evaluate the performance of the NoSQL stores employed within their clusters such as HBase,

Read more

YCSB 0.6.0 Update from Cloudera Labs

Categories: Cloudera Labs Performance

A new Cloudera Labs release of YCSB includes a variety of usability improvements.

A few months ago, this blog post announced that the YCSB framework is now a Cloudera Labs project. YCSB is the popular standard for evaluating the performance of a variety of data-serving systems and NoSQL stores such as Apache HBase and Apache Cassandra.

Since that time, the reinvigorated YCSB development community has been very active and produced multiple releases that incorporate several valuable improvements.

Read more

YCSB, the Open Standard for NoSQL Benchmarking, Joins Cloudera Labs

Categories: Cloudera Labs HBase Performance

YCSB, the open standard for comparative performance evaluation of data stores, is now available to CDH users for their Apache HBase deployments via new packages from Cloudera Labs.

Many factors go into deciding which data store should be used for production applications, including basic features, data model, and the performance characteristics for a given type of workload. It’s critical to have the ability to compare multiple data stores intelligently and objectively so that you can make sound architectural decisions.

Read more

New in CDH 5.4: Apache HBase Request Throttling

Categories: CDH HBase

The following post about the new request throttling feature in HBase 1.1 (now shipping in CDH 5.4) originally published in the ASF blog. We re-publish it here for your convenience.

Running multiple workloads on HBase has always been challenging, especially  when trying to execute real-time workloads while concurrently running analytical jobs. One possible way to address this issue is to throttle analytical MR jobs so that real-time workloads are less affected.

Read more