Cloudera Videos Hadoop World 2011: Large Scale Log Data Analysis for Marketing in NTT Communications
In this session we will talk about how we built a log analysis system for marketing using Hadoop, which explore the internet users' interests or feedback about specified products or themes from access log, query/click log and CGM data. Our system provides three features, which are 1) sentiment analysis, 2) co-occuring keyword extraction, and 3) user interests estimation. For large scale analysis, we use Hadoop with customized functions, which push down the shuffle size by amplifying map-side processing. We also show the features of our Hadoop cluster.

