Welcome to the first guest post on the Cloudera blog. The other day, we saw Toby from Swingly tweeting about using Apache Hadoop to process millions of other tweeters’ tweets. We were curious, and Toby put together a great writeup about how they use Hadoop to crunch data. We have a few other guest posts in the pipeline, but if you are doing something really fun with Hadoop and want to share,
Last Tuesday – on my second day of work at Cloudera – I went to London to check out the second UK Hadoop User Group meetup, kindly hosted by Sun in a nice meeting room not far from the river Thames. We saw a day of talks from people heavily involved with Hadoop, both on the development and usage side and more often than not a bit of both. It was a great opportunity to put a selection of people all interested in Hadoop technology in the same room and find out what the current status and future directions of the project are.
Apache Hadoop exists within a rich ecosystem of tools for processing and analyzing large data sets. At Facebook, my previous employer, we contributed a few projects of note to this ecosystem, all under the Apache 2.0 license:
- Thrift: A cross-language RPC framework that powers many of Facebook’s services, include search, ads, and chat. Among other things, Thrift defines a compact binary serialization format that is often used to persist data structures for later analysis.