Apache Hadoop Applied

BusinessWeek recently published a fascinating article on Apache Hadoop and Big Data, interviewing several Cloudera customers as well as our CEO Mike Olson. One of the things that has consistently exceeded our expectations is the diversity of industries that are adopting Hadoop to solve impressive business challenges and create real value for their organizations. Two distinct use cases that Hadoop is used to tackle have emerged across these industries. Though these have different names in each industry, the mechanics have clear parallels that cross domains.

Data Processing:

Data Processing is Hadoop’s original use case. By scaling out the amount of data that users could store and access in a single system then distributing the document and log processing used to index, and extract patterns from this data, Hadoop made a direct impact on the web and online advertising industries early on. Today, data processing means more than sessionization of click stream data, index construction or attribution for advertising. Hadoop is used to process data by commerce, media and telecommunications companies in order to measure engagement, and handle complex mediation. Retail and financial institutions use Hadoop to understand customer preferences, better target prices and reconcile trades. Most recently we’re seeing Hadoop used for time series and signal processing in the energy sector and genome mapping and alignment among life sciences organizations.

Advanced Analytics:

Today, Hadoop is not only used for data processing, but also advanced analytics. A slightly ambiguous term, we’ve found organizations speaking of advanced analytics as a way of referring to the types of analytics that are challenging or impossible using tools that are optimized for relational analysis. These new challenges include social network analysis and smarter targeting by web and advertising companies, content optimization by a wide variety of publishers and network analysis at media and telecommunications companies. Our retail customers look at the effectiveness of loyalty programs well beyond the classic market basket analysis as customer engagement crosses online and real world interactions. Financial institutions are able to take a deeper look at complex fraud and risk among their customers and throughout their internal systems. Both entity analysis and analysis based on next generation sequencing have recently been made possible using native Hadoop libraries.

We’re excited by Hadoop’s success, emerging as a powerful platform for such a diverse number of applications across a wide variety of industries. Here at Cloudera we’re also honored to have been able to work closely with leading companies in each of these industries and help tackle the type of challenges that are driving new business and creating real value for our customers.

Filed under:

1 Response

Leave a comment


× seven = 56