Tag Archives: cloud

Scaling Social Science with Apache Hadoop

Categories: General

This post was contributed by researcher Scott Golder, who studies social networks at Cornell University. Scott was previously a research scientists at HP Labs and the MIT Media Laboratory.

The methods of social science are dear in time and money and getting dearer every day.
— George C. Homans, Social Behavior: Its Elementary Forms, 1974.

When Homans — one of my favorite 20th century social scientists — wrote the above,

Read more

How Raytheon BBN Technologies Researchers are Using Hadoop to Build a Scalable, Distributed Triple Store

Categories: Guest

This post was contributed by Kurt Rohloff, a researcher in the Information and Knowledge Technologies group of Raytheon BBN Technologies, a wholly owned subsidiary of Raytheon Company.

Using Hadoop to Build a Scalable, Distributed Triple Store

The driving idea behind Semantic Web is to provide a web-scale information sharing model and platform.  One of the singular advancements over the past several years in the Semantic Web domain has been the explosion of data available in semantic formats

Read more

The Smart Grid: Hadoop at the Tennessee Valley Authority (TVA)

Categories: Community Guest Hadoop


For the last few months, we’ve been working with the TVA to help them manage hundreds of TB of data from America’s power grids. As the Obama administration investigates ways to improve our energy infrastructure, the TVA is doing everything they can to keep up with the volumes of data generated by the “smart grid.” But as you know, storing that data is only half the battle. In this guest blog post, the TVA’s Josh Patterson goes into detail about how Hadoop enables them to conduct deeper analysis over larger data sets at considerably lower costs than existing solutions.
Read more

5 Common Questions About Apache Hadoop

Categories: General Hadoop

There’s been a lot of buzz about Apache Hadoop lately. Just the other day, some of our friends at Yahoo! reclaimed the terasort record from Google using Hadoop, and the folks at Facebook let on that they ingest 15 terabytes a day into their 2.5 petabyte Hadoop-powered data warehouse.

But many people still find themselves wondering just how all this works, and what it means to them. We get a lot of common questions while working with customers,

Read more

Thrift, Scribe, Hive, and Cassandra: Open Source Data Management Software

Categories: General

Apache Hadoop exists within a rich ecosystem of tools for processing and analyzing large data sets. At Facebook, my previous employer, we contributed a few projects of note to this ecosystem, all under the Apache 2.0 license:

    • Thrift: A cross-language RPC framework that powers many of Facebook’s services, include search, ads, and chat. Among other things, Thrift defines a compact binary serialization format that is often used to persist data structures for later analysis.

    Read more