YCSB, the open standard for comparative performance evaluation of data stores, is now available to CDH users for their Apache HBase deployments via new packages from Cloudera Labs.
Many factors go into deciding which data store should be used for production applications, including basic features, data model, and the performance characteristics for a given type of workload. It’s critical to have the ability to compare multiple data stores intelligently and objectively so that you can make sound architectural decisions.
Strata + Hadoop World 2015 NYC is more than a daytime conference; it’s also a nighttime meetup experience. (Plus, there are a bunch of book signings.)
It won’t be long before we’re all in NYC for Strata + Hadoop World (Sept. 29-Oct. 1; if you haven’t registered yet, a 20% discount is still available). So, consider for your evening agenda:
Cloudera Director 1.5 introduces a new plugin architecture to enable support for additional cloud providers. If you want to implement a plugin to add integration with a cloud provider that is not supported out-of-the-box, or to extend one of the existing plugins, these details will get you started.
As discussed in our previous blog post, the Cloudera Director Service Provider Interface (Cloudera Director SPI) defines a Java interface and packaging standards for Cloudera Director plugins.
Thanks to Barclays employees Sam Savage, VP Data Science, and Harry Powell, Head of Advanced Analytics, for the guest post below about the Barclays use case for Apache Spark and its Scala API.
At Barclays, our team recently built an application called Insights Engine to execute an arbitrary number N of near-arbitrary SQL-like queries and execute them in a way that can scale with increasing N.
Learn about the architecture of Ibis, the roadmaps for Ibis and Impala, and how to get started and contribute.
We created Ibis, a new Python data analysis framework now incubating in Cloudera Labs, with the goal of enabling data scientists and data engineers to be as productive working with big data as they are working with small and medium data today. In doing so, we will enable Python to become a true first-class language for Apache Hadoop,