The HBaseCon 2014 “Case Studies” track surfaces some of the most interesting (and diverse) use cases in the HBase ecosystem — and in the world of NoSQL overall — today.
The HBaseCon 2014 (May 5, 2014 in San Francisco) is not just about internals and best practices — it’s also a place to explore use cases that you not have even considered before.
HBaseCon is just a couple short weeks away, so don’t wait to register.
- “A Graph Service for Global Web Entities Traversal and Reputation Evaluation Based on HBase”
Chris Huang and Scott Miao (Trend Micro)
Trend Micro collects lots of threat knowledge data for clients containing many different threat (web) entities. Most threat entities will be observed along with relations, such as malicious behaviors or interaction chains among them. So, we built a graph model on HBase to store all the known threat entities and their relationships, allowing clients to query threat relationships via any given threat entity. This presentation covers what problems we try to solve, what and how the design decisions we made, how we design such a graph model, and the graph computation tasks involved.
- “A Survey of HBase Application Archetypes”
Lars George and Jon Hsieh (Cloudera)
Today, there are hundreds of production HBase clusters running a multitude of applications and use cases. Many well-known implementations exercise opposite ends of the HBase’s capabilities emphasizing either entity-centric schemas or event-based schemas. This talk presents these archetypes and others based on a use-case survey of clusters conducted by Cloudera’s development, product, and services teams. By analyzing the data from the nearly 20,000 HBase cluster nodes Cloudera has under management, we’ll categorize HBase users and their use cases into a few simple archetypes, describe workload patterns, and quantify the usage of advanced features.
- “Blackbird: Storing Billions of Rows a Couple of Milliseconds Away”
Ishan Chhabra, Shrijeet Paliwal & Abhijit Pol (Rocket Fuel)
Blackbird, Rocket Fuel’s system built on top of HBase, makes billions of rich user profiles available for AI based optimization under the tight latency requirements of real time auction. It relies on our novel collections API, a constrained yet useful append only model that is sympathetic to HBase internals and allows us to scale our writes easily while keeping strict read performance guarantees. In this talk, we describe the key abstractions Blackbird exposes, utilities we built over time to support our use cases and our hardware and software configuration (including HBase configs) that helps us achieve our strict latency guarantees.
- “Content Identification using HBase” (20-minute session)
Daniel Nelson (Nielsen)
The motivation behind content identification is to determine the media people are consuming (via TV shows, movies, or streaming). Nielsen collects that data via its Fingerprints system, which generates significant amounts of structured data that is stored in HBase. This presentation will review the options a developer has for HBase querying and retrieval of hash data. Also covered is the use of wire protocols (Protocol Buffers), and how they can improve network efficiency and throughput, especially when combined with an HBase coprocessor.
- “Data Evolution in HBase”
Eric Czech and Alec Zopf (Next Big Sound)
Managing the evolution of data within HBase over time is not easy: Data resulting from Hadoop processing pipelines or otherwise placed in HBase is subject to the same kinds of oversights, bugs, and faulty assumptions inherent to the software that creates it. While the development of this software is often effectively managed through revision control systems, data itself is rarely modeled in a way that affords the same flexibility. In this session, we’ll talk about how to build a versioned, time-series data store using HBase that can provide significantly greater adaptability and performance than similar systems.
- “Digital Library Collection Management using HBase” (20-minute session)
Ron Buckley (OCLC)
OCLC has been working over the last year to move its massive repository to HBase. This talk will focus on the impetus behind the move, implementation details and technology choices we’ve made (key design, shredding PDFs and other digital objects into HBase, scaling), and the value-add that HBase brings to digital collection management.
- “HBase at Bloomberg: High Availability Needs for the Financial Industry” (20-minute session)
Sudarshan Kadambi and Matthew Hunt (Bloomberg LP)
Bloomberg is a financial data and analytics provider, so data management is core to what we do. There’s tremendous diversity in the type of data we manage, and HBase is a natural fit for many of these datasets – from the perspective of the data model as well as in terms of a scalable, distributed database. This talk covers data and analytics use cases at Bloomberg and operational challenges around HA. We’ll explore the work currently being done under HBASE-10070, further extensions to it, and how this solution is qualitatively different to how failover is handled by Apache Cassandra.
- “HBase Design Patterns @ Yahoo!” (20-minute session)
Francis Liu (Yahoo!)
HBase’s introduction into the Yahoo! Grid has provided our users with new ways to process and store data. A year after its availability, there has been varied usages: Event processing for personalization, incremental processing for ingestion, time-based aggregations for analytics, etc. All these were possible thanks to features HBase brings beyond working with HDFS files. This talk will review some recurring HBase design patterns at Yahoo! as well as share our learnings and experiences.
- “Large-scale Web Apps @ Pinterest”
Varun Sharma (Pinterest)
Over the past year, HBase has become an integral component of Pinterest’s storage stack. HBase has enabled us to quickly launch and iterate on new products and create amazing pinner experiences. This talk briefly describes some of these applications, the underlying schema, and how our HBase setup stays highly available and performant despite billions of requests every week. It will also include some performance tips for running on SSDs. Finally, we will talk about a homegrown serving technology we built from a mashup of HBase components that has gained wide adoption across Pinterest.
Thank you to our sponsors — Continuuity, Hortonworks, Intel, LSI, MapR, Salesforce.com, Splice Machine, WibiData (Gold); BrightRoll, Facebook, Pepperdata (Silver); ASF (Community); O’Reilly Media, The Hive, NoSQL Weekly (Media) — without which HBaseCon would be impossible!