Hadoop World 2011: Building a Model of Organic Link Traffic


Tuesday, November 8th, 2011


At bit.ly, we study behaviour on the internet by capturing clicks on shortened URLs. This link traffic comes in many forms, yet when studying human behaviour, we are only interested in ‘organic’ traffic: the traffic patterns caused by actual humans clicking on links that have been shared on the social web. This session will look at a model to extract and analyze these patterns by employing Python/Numpy, Streaming Hadoop, and machine learning. This model lets us extract the traffic we’re interested in from the variety of patterns generated by inorganic entities following bit.ly links.

Next Steps

Presentation Video