Tag Archives: ebs

Apache Impala (Incubating) on Amazon: Performance and Cost Considerations for S3 vs. EBS

Categories: Cloud Impala Performance

The benchmark testing results detailed below can help you make an informed decision about AWS storage options for Impala.

In a recent post, you learned how Impala 2.6 on S3 delivers cloud-native features unmatched by other analytic databases in the cloud. With support to read/write data from Amazon S3, Impala provides cloud capabilities such as direct querying of data from S3, elastic scaling of compute, and seamless data portability and flexibility not found on other cloud-based analytic databases, such as Amazon Redshift.

Read more

How-to: Create a CDH Cluster on Amazon EC2 via Cloudera Manager

Categories: CDH Cloud Cloudera Manager How-to Impala Ops and DevOps

Editor’s Note (added Feb. 25, 2015): For releases beyond 4.5, Cloudera recommends the use of Cloudera Director for deploying CDH in cloud environments. 

Cloudera Manager includes a new express installation wizard for Amazon Web Services (AWS) EC2. Its goal is to enable Cloudera Manager users to provision CDH clusters and Cloudera Impala (the open source distributed query engine for Apache Hadoop) on EC2 as easily as possible (for testing and development purposes only,

Read more

From Zero to Impala in Minutes

Categories: Cloud Guest How-to Impala

This was post was originally published by U.C. Berkeley AMPLab developer (and former Clouderan) Matt Massie, on his personal blog. Matt has graciously permitted us to re-publish here for your convenience.

Note: The post below is valid for Impala version 0.6 only and is not being maintained for subsequent releases. To deploy Impala 0.7 and later using a much easier (and also free) method, use this how-to.

Read more

Grouping Related Trends with Hadoop and Hive

Categories: Community General Hadoop Hive

(guest blog post by Pete Skomoroch)

In a previous post, I outlined how to build a basic trend tracking site called trendingtopics.org with Cloudera’s Distribution for Hadoop and Hive.  TrendingTopics uses Hadoop to identify the top articles trending on Wikipedia and displays related news stories and charts.  The data powering the site was pulled from an Amazon EBS Wikipedia Public Dataset containing 8 months of hourly pageview logfiles. 

Read more

Tracking Trends with Hadoop and Hive on EC2

Categories: Community General Guest Hadoop


At Cloudera, we frequently work with leading Hadoop developers to produce guest blog posts of general interest to the community. We started a project with Pete Skomoroch a while back, and we were so impressed with his work, we’ve decided to bring Pete on as a regular guest blogger. Pete can show you how to do some pretty amazing things with Hadoop, Pig and Hive and has a particular bias towards Amazon EC2.

Read more