Author Archives: Matt Massie

Hadoop at Twitter (part 1): Splittable LZO Compression

Categories: General

This summer I sent the following tweet, “Had lunch today at Twitter HQ. Thanks for the invite, @kevinweil! Great lunch conversation. Smart, friendly and fun team.” Kevin Weil leads the analytics team at Twitter and is an active member of the Hadoop community, and his colleague Eric Maland leads Operations.  Needless to say, Twitter is doing amazing things with Hadoop.  This guest blog from Kevin and Eric covers one of Twitter’s open-source projects which provides a solution for splittable LZO for Hadoop.

Read More

CDH2: Cloudera’s Distribution for Apache Hadoop 2

Categories: Community Hadoop Hive Pig

In March of this year, we released our distribution for Apache Hadoop.  Our initial focus was on stability and making Hadoop easy to install. This original distribution, now named CDH1, was based on the most stable version of Apache Hadoop at the time:0.18.3. We packaged up Apache Hadoop, Pig and Hive into RPMs and Debian packages to make managing Hadoop installations easier.  For the first time ever, Hadoop cluster managers were able to bring up a deployment by running one of the following commands depending on your Linux distribution:

As proof of this,

Read More

High Energy Hadoop

Categories: General Guest Hadoop HDFS

We asked Brian Bockelman, a Post Doc Research Associate in the Computer Science & Engineering Department at the University of Nebraska–Lincoln, to tell us how Hadoop is being used to process the results from High-Energy Physics experiments.  His response gives insights into the kind and volume of data that High-Energy Physics experiments generate and how Hadoop is being used at the University of Nebraska. -Matt

In the least technical language,

Read More