The HBaseCon 2014 “Ecosystem” track offers a cross-section view of the most interesting projects emerging on top of, or alongside, HBase.
HBaseCon is just a few short weeks away, so don’t wait to register.
- “Cross-Site BigTable using HBase”
Jingcheng Du and Ramkrishna Vasudevan (Intel)
As HBase continues to expand in application and enterprise or government deployments, there is a growing demand for storing data across geographically distributed datacenters for improved availability and disaster recovery. The Cross-Site BigTable extends HBase to make it well-suited for such deployments, providing the capabilities of creating and accessing HBase tables that are partitioned and asynchronously backed-up over a number of distributed datacenters. This talk reveals how the Cross-Site BigTable manages data access over multiple datacenters and removes the data center itself as a single point of failure in geographically distributed HBase deployments.
- “Design Patterns for Building 360-degree Views with HBase and Kiji”
Jonathan Natkins (WibiData)
Many companies aspire to have 360-degree views of their data. Whether they’re concerned about customers, users, accounts, or more abstract things like sensors, organizations are focused on developing capabilities for analyzing all the data they have about these entities. This talk will introduce the concept of entity-centric storage, discuss what it means, what it enables for businesses, and how to develop an entity-centric system using the open-source Kiji framework and HBase. It will also compare and contrast traditional methods of building a 360-degree view on a relational database versus building against a distributed key-value store, and why HBase is a good choice for implementing an entity-centric system.
- “HBase Data Modeling and Access Patterns with Kite SDK”
Adam Warrington (Cloudera)
The Kite SDK is a set of libraries and tools focused on making it easier to build systems on top of the Hadoop ecosystem. HBase support has recently been added to the Kite SDK Data Module, which allows a developer to model and access data in HBase consistent with how they would model data in HDFS using Kite. This talk will focus on Kite’s HBase support by covering Kite basics and moving through the specifics of working with HBase as a data source. This feature overview will be supplemented by specifics of how that feature is being used in production applications at Cloudera.
- “OpenTSDB 2.0”
Chris Larsen (Limelight Networks) and Benoit Sigoure (Arista Networks)
The OpenTSDB community continues to grow and with users looking to store massive amounts of time-series data in a scalable manner. In this talk, we will discuss a number of use cases and best practices around naming schemas and HBase configuration. We will also review OpenTSDB 2.0’s new features, including the HTTP API, plugins, annotations, millisecond support, and metadata, as well as what’s next in the roadmap.
- “Presto + HBase: A Distributed SQL Query Execution Engine on Top of HBase”
Manukranth Kolloju (Facebook)
Presto is a distributed SQL query engine optimized for ad hoc analysis at interactive speed in use at Facebook. At Facebook scale, having ad hoc SQL query capabilities for high-volume NoSQL data stores has been a very valuable asset, and Presto enabled this by supporting connectors on top of HDFS and other data providers. To effectively process the Presto SQL-based workload, HBase needs to be able to efficiently support a critical set of data access patterns over large data sets with high performance. This talk covers the improvements we’ve made to enhance scan performance and optimize the read path, as well as a number of other new features that help push down the work from the query execution to the database.
- “Tasmo: Building HBase Applications From Event Streams”
Pete Matern and Jonathan Colt (Jive Software)
Tasmo is a system that enables application development on top of event streams and HBase. Its functionality is similar to a materialized view in a relational database, where data is maintained at write time in the forms it is needed at read time for display and indexing. Tasmo is designed for significantly read-heavy applications that display the same underlying data in multiple forms, where repeatedly performing the required selects and joins at read time can be prohibitively expensive. In this talk, we’ll explore the features and roadmap for Tasmo.
- “Taming HBase with Apache Phoenix and SQL”
Eli Levine, James Taylor (Salesforce.com) & Maryann Xue (Intel)
HBase is the Turing machine of the Big Data world. It’s been scientifically proven that you can do *anything* with it. This is, of course, a blessing and a curse, as there are so many different ways to implement a solution. Apache Phoenix (incubating), the SQL engine over HBase to the rescue. Come learn about the fundamentals of Phoenix and how it hides the complexities of HBase while giving you optimal performance, and hear about new features from our recent release, including updatable views that share the same physical HBase table and n-way equi-joins through a broadcast hash join mechanism. We’ll conclude with a discussion about our roadmap and plans to implement a cost-based query optimization to dynamically adapt query execution based on your data sizes.
Interested yet? If not, next week, we’ll offer a preview of the “Case Studies” track.
Thank you to our sponsors — Continuuity, Hortonworks, Intel, LSI, MapR, Salesforce.com, Splice Machine, WibiData (Gold); BrightRoll, Facebook, Pepperdata (Silver); ASF (Community); O’Reilly Media, The Hive, NoSQL Weekly (Media) — without which HBaseCon would be impossible!