Tag Archives: cloud

5 Common Questions About Apache Hadoop

Categories: General Hadoop

There’s been a lot of buzz about Apache Hadoop lately. Just the other day, some of our friends at Yahoo! reclaimed the terasort record from Google using Hadoop, and the folks at Facebook let on that they ingest 15 terabytes a day into their 2.5 petabyte Hadoop-powered data warehouse.

But many people still find themselves wondering just how all this works, and what it means to them. We get a lot of common questions while working with customers,

Read more

Thrift, Scribe, Hive, and Cassandra: Open Source Data Management Software

Categories: General

Apache Hadoop exists within a rich ecosystem of tools for processing and analyzing large data sets. At Facebook, my previous employer, we contributed a few projects of note to this ecosystem, all under the Apache 2.0 license:

    • Thrift: A cross-language RPC framework that powers many of Facebook’s services, include search, ads, and chat. Among other things, Thrift defines a compact binary serialization format that is often used to persist data structures for later analysis.

    Read more