Since the last blog post announcing the release of YCSB 0.6.0 in Cloudera Labs, users of Cloudera CDH and EDH will have noticed regular updates to the Labs version, keeping it in lockstep with the upstream release. This should help assure users of a consistent and easy mechanism to deploy the current version of YCSB (which at the moment is v.0.10.0 in CLABS) to evaluate the performance of the NoSQL stores employed within their clusters such as HBase,
Livy, which streamlines Spark architecture for web/mobile apps, is the newest addition to Cloudera Labs.
With respect to the impact of Apache Spark on the Apache Hadoop ecosystem, its virtual overnight adoption as the default data processing engine—and as a standard for powering advanced analytic applications—speaks for itself. But, that’s not to say that there isn’t work yet to be done, particularly in the areas of performance at scale/under multi-tenancy,
As a warm-up to Spark Summit West in San Francisco (June 6-8), we’ve added a new project to Cloudera Labs that makes building Spark Streaming pipelines considerably easier.
Spark Streaming is the go-to engine for stream processing in the Cloudera stack. It allows developers to build stream data pipelines that harness the rich Spark API for parallel processing, expressive transformations, fault tolerance, and exactly-once processing. But it requires a programmer to write code,
Bringing Time Series for Spark into Cloudera Labs is a reflection of its potentially future usefulness in more use cases.
Time is more important than ever to data. We’re not merely interested in how things are, but how they change, where tendencies lead, and where trends are heading into unusual territory. Many classic machine-learning techniques do nothing in particular with time, and so assume the past and future are all similar. We know that’s increasingly inaccurate.
A new Cloudera Labs release of YCSB includes a variety of usability improvements.
A few months ago, this blog post announced that the YCSB framework is now a Cloudera Labs project. YCSB is the popular standard for evaluating the performance of a variety of data-serving systems and NoSQL stores such as Apache HBase and Apache Cassandra.
Since that time, the reinvigorated YCSB development community has been very active and produced multiple releases that incorporate several valuable improvements.