Zbigniew Baranowski is a database systems specialist and a member of a group which provides and supports central database and Hadoop-based services at CERN. This blog was originally released on CERN’s “Databases at CERN” blog, and is syndicated here with CERN’s permission.
This post presents a performance comparison of few popular data formats and storage engines available in the Apache Hadoop ecosystem: Apache Avro,
For the first time, this new study by Intel software engineers analyzes the performance impact of using Apache HBase on various modern storage technologies.
As more “fast” storage technologies (such as SSD and NVMe SSD) emerge, organizations with big data use cases want to make better use of them to achieve better throughput and latency. But to this point, there have been no detailed analyses published about the true significance of that performance boost,
For Cloudera, Apache HBase has grown into a stable, scalable, mature, and critical component of the Apache Hadoop stack.
HBase adds the ability to do low-latency random read/write across your big data. While it is a key piece of the Apache Hadoop ecosystem, HBase itself has an ecosystem of projects and products that use it as a storage engine for systems such as time series database (OpenTSDB), or SQL-style databases (Apache Phoenix,
Thanks to Pedro Boado and Abel Fernandez Alfonso from Santander’s engineering team for their collaboration on this post about how Santander UK is using Apache HBase as a near real-time serving engine to power its innovative Spendlytics app.
The Spendlytics iOS app is designed to help Santander’s personal debit and credit-card customers keep on top of their spending, including payments made via Apple Pay. It uses real-time transaction data to enable customers to analyze their card spend across time periods (weekly,
Taking a thoughtful approach to data serialization can achieve significant performance improvements for HBase deployments.
The question of using tall versus wide tables in Apache HBase is a commonly discussed design pattern (see reference here and here). However, there are more considerations here than making that simple choice. Because HBase stores each column of a table as an independent row in the underlying HFiles, significant storage overhead can occur when storing small pieces of information.