Tag Archives: cloud

Map-Reduce With Ruby Using Apache Hadoop

Categories: Hadoop MapReduce

Guest re-post from Phil Whelan, a large-scale web-services consultant based in Vancouver, BC.

Map-Reduce With Hadoop Using Ruby
Here I demonstrate, with repeatable steps, how to fire-up a Hadoop cluster on Amazon EC2, load data onto the HDFS (Hadoop Distributed File-System), write map-reduce scripts in Ruby and use them to run a map-reduce job on your Hadoop cluster. You will not need to ssh into the cluster, as all tasks are run from your local machine.

Read more

Do the Schimmy: Efficient Large-Scale Graph Analysis with Hadoop

Categories: General

Guest Post by Michael Schatz and Jimmy Lin

Michael Schatz is an assistant professor in the Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory. His research interests are in developing large-scale DNA sequence analysis methods to search for DNA sequence variations related to autism, cancer, and other human diseases, and also to assemble the genomes of new organisms. Given the recent tremendous advances of DNA sequencing technologies, Michael has pioneered the use of Hadoop and cloud computing for accelerating genomics,

Read more

Scaling Social Science with Apache Hadoop

Categories: General

This post was contributed by researcher Scott Golder, who studies social networks at Cornell University. Scott was previously a research scientists at HP Labs and the MIT Media Laboratory.

The methods of social science are dear in time and money and getting dearer every day.
— George C. Homans, Social Behavior: Its Elementary Forms, 1974.

When Homans — one of my favorite 20th century social scientists — wrote the above,

Read more

How Raytheon BBN Technologies Researchers are Using Hadoop to Build a Scalable, Distributed Triple Store

Categories: Guest

This post was contributed by Kurt Rohloff, a researcher in the Information and Knowledge Technologies group of Raytheon BBN Technologies, a wholly owned subsidiary of Raytheon Company.

Using Hadoop to Build a Scalable, Distributed Triple Store

The driving idea behind Semantic Web is to provide a web-scale information sharing model and platform.  One of the singular advancements over the past several years in the Semantic Web domain has been the explosion of data available in semantic formats

Read more

The Smart Grid: Hadoop at the Tennessee Valley Authority (TVA)

Categories: Community Guest Hadoop


For the last few months, we’ve been working with the TVA to help them manage hundreds of TB of data from America’s power grids. As the Obama administration investigates ways to improve our energy infrastructure, the TVA is doing everything they can to keep up with the volumes of data generated by the “smart grid.” But as you know, storing that data is only half the battle. In this guest blog post, the TVA’s Josh Patterson goes into detail about how Hadoop enables them to conduct deeper analysis over larger data sets at considerably lower costs than existing solutions.
Read more