- Search — Google figured out how to index all of the content on the internet so you (internet user) can find what you’re looking for.
- Social — Facebook, Twitter, LinkedIn and other social sites give your friends and other social connections a mechanism to push content you’re interested in to you, so you don’t have to search for it yourself.
So what does Gravity do? Its goal is to drive the third paradigm shift:
- Personal — Creating a web experience that is totally optimized based on your individual interests, behaviors, and preferences. Or, as Jim puts it, “showing you today what you’re going to search for tomorrow.”
Gravity collects and processes more than 10,000 data points every second. All of the data collected is loaded into HDFS, where two Apache Hadoop processes run. The first is a dynamic, real-time system that uses something called “eventual consistency,” meaning it correctly processes as many data points as it can—about both user activity across the web and content that is being published—in real time. 99.99% of that traffic is processed correctly. The second system runs every hour or two, catching the .01% of data points that were missed the first time around. Once the data is processed, it lands in Apache HBase where it is serialized and can be accessed via Apache Hive.
With several Scala engineers in house, the Gravity team decided in 2011 to use the Scala programming language instead of Java. It doesn’t natively integrate with Hadoop or HBase, so the Gravity team wrote its own open source library called HPaste, which allows Scala engineers to take advantage of all the unique features of Scala on top of HBase.
The results of this system?
- Higher click-through rates (CTR) — Gravity has measured CTR of people who engage with their personalized content versus standard segmented or generic content, and they’ve proven that personalized content delivers 300-400% higher CTR.
- Longer sessions — When personalized content is displayed on a web page, users stay on the page longer, which is a strong indication that they like the site more.
- More repeat visitors — If a web visitor sees personalized content their first time visiting a site, the number of times they return to that site afterward is more than 10X higher than when they engage with static content shown to all visitors. Gravity has proven this at scale across some of its largest customers.
Want to learn more?
- Read the full case study.
- Watch Gravity’s Jim Benedetto explain its use case on video.
- Explore the HPaste project on GitHub.
- Learn more about Gravity.
Karina Babcock is Cloudera’s Customer Programs & Marketing Manager.