We blogged about 104 different topics in 2010 and we recently decided to take a look back and see what folks were most interested in reading. The topics that were featured ranged from Clouderas Distribution for Apache Hadoop technical updates (CDH3b3 being the most recent) to highlighting upcoming Hadoop related events and activities to sharing practical insights for implementing Hadoop. We also featured a number of guest blog posts.
Here are the top 10 blog posts from 2010:
- How to Get a Job at Cloudera
Cloudera is hiring around the clock, and this blog highlights the best course of action to increase your chances of becoming a Clouderan.
- Why Europes Largest Ad Targeting Platform Uses Hadoop
As data volumes increased and performance suffered, we recognized a new approach was needed (Hadoop). Richard Hutton, Nugg.ad CTO
- Whats New in CDH3b2 Flume
Flume, our data movement platform, was introduced to the world and into the open source environment.
- Whats New in CDH3b2 Hue
Hue, a web UI for Hadoop, is a suite of web applications as well as a platform for building custom applications with a nice UI library.
- Natural Language Processing with Hadoop and Python
Data volumes are increasing naturally from text (blogs) and speech (YouTube videos) posing new questions for Natural Language Processing. This involves making sense of lots of data in different forms and extracting useful insights.
- How Raytheon BBN Technologies Researchers are Using Hadoop to Build a Scalable, Distributed Triple Store
Raytheon BBN Technologies built a cloud-based triple-store technology, known as SHARD, to address scalability issues in the processing and analysis of Semantic Web data.
- Clouderas Support Team Shares Some Basic Hardware Recommendations
The Cloudera support team discusses workload evaluation and the critical role it plays in hardware selection.
- Integrating Hive and HBase
Facebook explains integrating Hive and HBase to keep their warehouse up to date with the latest information published by users.
- Pushing the Limits of Distributed Processing
Google built a 100,000 node Hadoop cluster running on Nexus One mobile phone hardware and powered by Android. The environmental cost of this solution is 1/100th the equivalent of running it within their data center. (April Fools)
- Using Flume to Collect Apache 2 Web Server Logs
This post presents the common use case of using a Flume node to collect Apache 2 web server logs and deliver them to HDFS.
Aside from How to Get a Job at Cloudera, Cloudera blog readers viewed posts related to CDH and its components, posts exemplifying possibilities with Hadoop in production, and posts highlighting integrations with Hadoop.
Looking forward we plan to continue to feature technical and non-technical topics, as well as guest posts from customers and the community, and plan to increase our number of published posts. If there is a topic you would like to learn more about, or you have a Hadoop story you would like to share we would love to hear your ideas. Email suggestions to firstname.lastname@example.org.