Category Archives: General

Meet Cloudera’s Apache Spark Committers

Categories: Community General Meet the Engineer Spark

The super-active Apache Spark community is exerting a strong gravitational pull within the Apache Hadoop ecosystem. I recently had that opportunity to ask Cloudera’s Apache Spark committers (Sean Owen, Imran Rashid [PMC], Sandy Ryza, and Marcelo Vanzin) for their perspectives about how the Spark community has worked and is working together, and the work to be done via the One Platform initiative to make the Spark stack enterprise-ready.

Recently, Apache Spark has become the most currently active project in the Apache Hadoop ecosystem (measured by number of contributors/commits over time),

Read more

How-to: Install Apache Zeppelin on CDH

Categories: General Guest How-to Spark

Our thanks to Karthik Vadla and Abhi Basu, Big Data Solutions engineers at Intel, for permission to re-publish the following (which was originally available here).

Data science is not a new discipline. However, with the growth of big data and adoption of big data technologies, the request for better quality data has grown exponentially. Today data science is applied to every facet of life—product validation through fault prediction,

Read more

Got SQL? Xplain.io Joins Cloudera

Categories: General

Xplain.io is now part of Cloudera. 

Fifteen months ago, Rituparna Agrawal and I incorporated Xplain.io in a small shed in my backyard. With intense focus on solving real customer problems, we built an eclectic and diverse team with skills across database internals, distributed systems, and customer-centric design.

Throughout Q4 2013, we interviewed more than 60 enterprise data architects and found that they were all overwhelmed with the choices available in modern data management.

Read more

BigBench: Toward An Industry-Standard Benchmark for Big Data Analytics

Categories: General Hardware Performance

Learn about BigBench, the new industrywide effort to create a sorely needed Big Data benchmark.

Benchmarking Big Data systems is an open problem. To address this concern, numerous hardware and software vendors are working together to create a comprehensive end-to-end big data benchmark suite called BigBench. BigBench builds upon and borrows elements from existing benchmarking efforts in the Big Data space (such as YCSB, TPC-xHS,

Read more