YCSB, the Open Standard for NoSQL Benchmarking, Joins Cloudera Labs

Categories: Cloudera Labs HBase Performance

YCSB, the open standard for comparative performance evaluation of data stores, is now available to CDH users for their Apache HBase deployments via new packages from Cloudera Labs.

Many factors go into deciding which data store should be used for production applications, including basic features, data model, and the performance characteristics for a given type of workload. It’s critical to have the ability to compare multiple data stores intelligently and objectively so that you can make sound architectural decisions.

The Yahoo! Cloud Serving Benchmark (YCSB), an open source framework for evaluating and comparing the performance of multiple types of data-serving systems (including NoSQL stores such as Apache HBase, Apache Cassandra, Redis, MongoDB, and Voldemort), has long been the de facto open standard for this purpose. Today, Cloudera is announcing that CDH users can now easily install and use YCSB to evaluate the performance of their HBase deployments by taking advantage of new packages in Cloudera Labs. (As with all Cloudera Labs projects, although these packages are not currently supported we do strongly encourage you to experiment with them.)

In this post, you will learn how YCSB works and how to install and use it against HBase.

How YCSB Works

YCSB was developed at Yahoo! Labs to provide a framework and common set of workloads for evaluating the performance of different key-value stores. It has two parts:

  • The YCSB Client, an extensible workload generator
  • The core workloads, a set of workload scenarios to be executed by the generator

The core workloads provide a well rounded picture of a system’s performance, and the client is extensible so that you can define additional workloads to examine system aspects or application scenarios not covered by the core workload. The client can also be extended to benchmark different databases. YCSB ships with bindings for a long list of databases including HBase, Cassandra, Apache Accumulo, MongoDB, and Voldemort, and you can add support for a different data store by writing an interface layer.

To benchmark multiple data stores and compare them, you can install those data stores within a single deployment (or alternatively on multiple instances of a selected hardware configuration). Then, the same workloads can be run against against the various data stores. Next, plot the performance of each system, to see their relative performance profiles. One example of a good visualization to try is latency versus throughput curves.

Alternatives to YCSB

There are several options available to evaluate the performance of databases—so, why use YCSB?

For example, the TPC-H benchmark suite is commonly used to benchmark relational databases, and some NoSQL systems include native tools for measuring throughput and latency. (HBase comes packaged with the PerformanceEvaluation utility.) However, most of these approaches don’t work with other systems and often focus on scenarios in which the data store excels.

From the perspective of a generic, database-neutral, performance evaluation utility, YCSB is currently the de-facto comparative benchmark for NoSQL stores. It includes support for a wide range of database bindings and is commonly used to compare their performance for a set of desired workloads. Being open source and extensible, support for additional databases is regularly added.

Project Status

Although YCSB development has been fairly low-key for a while, Cloudera sees great value in the YCSB project for the HBase community, and recently Cloudera engineers have been working with Brian Cooper, the original author of YCSB, to reinvigorate the project within the developer community. A number of enhancements have already been added, and a regular release cycle has been established. Some of the recent improvements to YCSB include:

  • Latency capture via HDRHistogram
  • Measuring transaction latency against a fixed schedule
  • Support for an additional JSON format
  • Better reporting and status output
  • New database bindings

Installing YCSB with CDH

YCSB packages and parcels for CDH, including basic documentation, can be downloaded from here. (YCSB 0.3.0 is the version packaged in Cloudera Labs.) To install this version, within Cloudera Manager, click on the “Parcels” icon in the top bar,  then click on the “Edit Settings” button. Add the following link to the list of URLs enumerated in the “Remote Parcel Repository URLs” setting:

http://archive.cloudera.com/cloudera-labs/ycsb/parcels/latest

Then, install the parcel and activate it as shown in the screen snapshot below:

Running YCSB with HBase

Below is a sample session running YCSB against HBase. First, a table is created within HBase that will be used by YCSB, then one of the pre-packaged workloads is run against HBase.

In a nutshell, to run a YCSB workload:

  1. Set up the database you will be testing.
  2. Create the table and load data into it, either using YCSB or manually.
  3. Choose the workload in YCSB. Some of the available workloads include read transactions, update transactions, and mixed transactions.
  4. Choose runtime parameters, such as the request distribution and number of threads.
  5. Invoke the YCSB client to apply the workload.
  6. When it is done, the client will report throughput and latency results.

Conclusion

We hope you see value in YCSB for benchmarking your HBase deployments. If you have any feedback items or questions, please use the Cloudera Labs area at community.cloudera.com to tell us about them. In the meantime, the following is useful additional reading:

Govind Kamat is a Performance Engineer at Cloudera.

facebooktwittergoogle_pluslinkedinmailfacebooktwittergoogle_pluslinkedinmail

8 responses on “YCSB, the Open Standard for NoSQL Benchmarking, Joins Cloudera Labs

    1. Govind Kamat

      No, YCSB was not designed to test Zookeeper’s performance. Zookeeper is not a data store but a distributed coordination service. Its functionality includes operations such as creating znodes, deleting them, reading and writing to them, setting watches and so on.

      You could use Patrick Hunt’s zk-smoketest utility to evaluate Zookeeper’s performance. There are others you may find as well, or you could write a custom client to test the specific operations you are interested in.

  1. Kevin

    Does this version of YCSB work with CDH 5.4.7? When I distributed the Parcel I was unable to start HBase. Removing the YCSB Parcel corrected the problem. Do you know if you will be releasing a Parcel with a newer version of YCSB for CDH 5.4.7? My guess is this did not include HBase 1.x support.

  2. Kevin

    I see a new Parcel for YCSB (0.5.0-1.clabs_ycsb1.2.0.p0.936). Do you know what changes are included in this release? I can’t seem to find a write up of this Parcel.

  3. levy

    I want to test RocksDB with ycsb and when I test, I want to use multi threads, but as we know, it seems its a bit difficulty for ycsb do multi threads test.

  4. jerry

    Hi writer,
    my name is Jerry.dong. I installed CDH use PATHA and PATHC. When I used PATHC to finish installation of CDH, the YCSB can work well. But when I used another way to install CDH, I used YCSB to test hbase , it showed me that all hbase RegionServer exited when YCSB began to test. Could you tell me how to resolve it?

Leave a Reply

Your email address will not be published. Required fields are marked *