Analytical and operational access patterns are very different and until now the Hadoop ecosystem has not had a single storage engine that could support both. As a result, engineers have been forced to implement complex architectures that stitch multiple systems together in order to provide these capabilities. On one hand immutable data on HDFS offers superior analytic performance, while mutable data in Apache HBase is best for operational workloads. Apache Kudu bridges this gap.
Kudu’s architecture is shaped towards the ability to provide very good analytical performance,
Thanks to Pedro Boado and Abel Fernandez Alfonso from Santander’s engineering team for their collaboration on this post about how Santander UK is using Apache HBase as a near real-time serving engine to power its innovative Spendlytics app.
The Spendlytics iOS app is designed to help Santander’s personal debit and credit-card customers keep on top of their spending, including payments made via Apple Pay. It uses real-time transaction data to enable customers to analyze their card spend across time periods (weekly,
Learn about the near real-time data ingest architecture for transforming and enriching data streams using Apache Flume, Apache Kafka, and RocksDB at Santander UK.
Cloudera Professional Services has been working with Santander UK to build a near real-time (NRT) transactional analytics system on Apache Hadoop. The objective is to capture, transform, enrich, count, and store a transaction within a few seconds of a card purchase taking place. The system receives the bank’s retail customer card transactions and calculates the associated trend information aggregated by account holder and over a number of dimensions and taxonomies.