Author Archives: Amr Awadallah

Big Data’s New Use Cases: Transformation, Active Archive, and Exploration

Categories: Hadoop Impala Use Case

Now that Apache Hadoop is seven years old, use-case patterns for Big Data have emerged. In this post, I’m going to describe the three main ones (reflected in the post’s title) that we see across Cloudera’s growing customer base.


Transformations (T, for short) are a fundamental part of BI systems: They are the process through which data is converted from a source format (which can be relational or otherwise) into a relational data model that can be queried via BI tools.

Read more

Cloudera is the Second "Sexiest Enterprise Startup" for 2012

Categories: General

Last Thursday, I had the pleasure of attending the Crunchies with Alan Saldich (our VP of Marketing) and Sarah Mustarde (our Senior Director of Corporate Marketing). The Crunchies is an awards event a la “the Oscars” but for startups. 

Cloudera was nominated for the “Sexiest Enterprise Startup” award. This was the 6th Annual Crunchies but the first time that TechCrunch had a slot for enterprise software,

Read more

Grouping Related Trends with Hadoop and Hive

Categories: Community General Hadoop Hive

(guest blog post by Pete Skomoroch)

In a previous post, I outlined how to build a basic trend tracking site called with Cloudera’s Distribution for Hadoop and Hive.  TrendingTopics uses Hadoop to identify the top articles trending on Wikipedia and displays related news stories and charts.  The data powering the site was pulled from an Amazon EBS Wikipedia Public Dataset containing 8 months of hourly pageview logfiles. 

Read more

Tracking Trends with Hadoop and Hive on EC2

Categories: Community General Guest Hadoop

At Cloudera, we frequently work with leading Hadoop developers to produce guest blog posts of general interest to the community. We started a project with Pete Skomoroch a while back, and we were so impressed with his work, we’ve decided to bring Pete on as a regular guest blogger. Pete can show you how to do some pretty amazing things with Hadoop, Pig and Hive and has a particular bias towards Amazon EC2.

Read more