A new Cloudera Labs release of YCSB includes a variety of usability improvements.
A few months ago, this blog post announced that the YCSB framework is now a Cloudera Labs project. YCSB is the popular standard for evaluating the performance of a variety of data-serving systems and NoSQL stores such as Apache HBase and Apache Cassandra.
Since that time, the reinvigorated YCSB development community has been very active and produced multiple releases that incorporate several valuable improvements. Therefore, it’s appropriate to update the version of YCSB available within Cloudera Labs to a more recent one. We have chosen version 0.6.0.
New Databases Supported
What does this release offer? For a start, there are several new database bindings now available. One of the initial premises that YCSB was developed with included the ability for extensibility: namely, that it would provide support for new database systems. However, in practice, not many additional bindings appeared within YCSB for a variety of reasons: database authors not adding YCSB support, developers not contributing their bindings upstream, inconsistent binding behavior as compared to existing bindings, and so on.
However, that state of affairs has now changed dramatically. Over the last few months, new YCSB support has been added for well-known databases like Aerospike, Apache Kudu (incubating), Cassandra version 2, and the Google Cloud Data Store.
In particular, Cloudera customers have shown considerable interest in Kudu for its focus on fast analytics and real-time capabilities. In fact, the Kudu development team decided to use YCSB as one of its performance benchmarks very early and ensured that a Kudu YCSB binding would be made available as an integral part of that project.
For HBase users, there’s an additional bonus with the new release: YCSB now fully supports older versions of HBase. To implement this capability, the HBase binding has been broken up by version: use either the hbase094, hbase098, or hbase10 binding as appropriate.
Support for all of these new bindings makes YCSB even more of a benchmarking standard than ever before. With respect to additional enhancements, the default capture of latency measurements has been changed to the HDRHistogram style, which provides much better visibility into “long-tail” behavior than the old fixed-style buckets.
Other improvements include:
- Updates to existing bindings to support newer versions of the associated data stores.
- An estimate of time remaining is added to status messages; this is helpful when the database access speed changes mid-run.
- The reporting of 95th and 99th percentile latency figures in milliseconds as opposed to microseconds was a sore point with many users. This has been changed; all figures are now in microseconds.
- Users can now specify the percentile buckets they are interested in, rather than being forced to use the standard pre-configured ones.
- Status codes are now descriptive rather than numeric.
- A new “raw” measurement output option to extract all collected data, possibly for separate statistical analysis.
- An attempt to ensure similar default configuration settings across bindings, for instance, with regard to caching when accessing the back-end database. This change contributes to a true “apples-to-apples” performance comparison.
- Several other enhancements and bug fixes, too many to enumerate here. They are all listed in the YCSB release notes.
A brief “quick-start” guide on running YCSB against HBase is available here.
So, grab this opportunity to download the latest version of YCSB and take it for a spin! You will get valuable insights into the performance of the data stores in your own deployments.
Govind Kamat is a Performance Engineer at Cloudera.