Category Archives: General

How-to: Install Apache Zeppelin on CDH

Categories: General Guest How-to Spark

Our thanks to Karthik Vadla and Abhi Basu, Big Data Solutions engineers at Intel, for permission to re-publish the following (which was originally available here).

Data science is not a new discipline. However, with the growth of big data and adoption of big data technologies, the request for better quality data has grown exponentially. Today data science is applied to every facet of life—product validation through fault prediction,

Read more

Got SQL? Xplain.io Joins Cloudera

Categories: General

Xplain.io is now part of Cloudera. 

Fifteen months ago, Rituparna Agrawal and I incorporated Xplain.io in a small shed in my backyard. With intense focus on solving real customer problems, we built an eclectic and diverse team with skills across database internals, distributed systems, and customer-centric design.

Throughout Q4 2013, we interviewed more than 60 enterprise data architects and found that they were all overwhelmed with the choices available in modern data management.

Read more

BigBench: Toward An Industry-Standard Benchmark for Big Data Analytics

Categories: General Hardware Performance

Learn about BigBench, the new industrywide effort to create a sorely needed Big Data benchmark.

Benchmarking Big Data systems is an open problem. To address this concern, numerous hardware and software vendors are working together to create a comprehensive end-to-end big data benchmark suite called BigBench. BigBench builds upon and borrows elements from existing benchmarking efforts in the Big Data space (such as YCSB, TPC-xHS,

Read more

Using Impala, Amazon EMR, and Tableau to Analyze and Visualize Data

Categories: Cloud General Guest

Our thanks to AWS Solutions Architect Rahul Bhartia for allowing us to republish his post below.

Apache Hadoop provides a great ecosystem of tools for extracting value from data in various formats and sizes. Originally focused on large-batch processing with tools like MapReduce, Apache Pig, and Apache Hive, Hadoop now provides many tools for running interactive queries on your data, such as Impala, Drill, and Presto. This post shows you how to use Amazon Elastic MapReduce (Amazon EMR) to analyze a data set available on Amazon Simple Storage Service (Amazon S3) and then use Tableau with Impala to visualize the data.

Read more