Learn about the near real-time data ingest architecture for transforming and enriching data streams using Apache Flume, Apache Kafka, and RocksDB at Santander UK.
Cloudera Professional Services has been working with Santander UK to build a near real-time (NRT) transactional analytics system on Apache Hadoop. The objective is to capture, transform, enrich, count, and store a transaction within a few seconds of a card purchase taking place. The system receives the bank’s retail customer card transactions and calculates the associated trend information aggregated by account holder and over a number of dimensions and taxonomies.
To design effective fraud-detection architecture, look no further than the human brain (with some help from Spark Streaming and Apache Kafka).
At its core, fraud detection is about detection whether people are behaving “as they should,” otherwise known as catching anomalies in a stream of events. This goal is reflected in diverse applications such as detecting credit-card fraud, flagging patients who are doctor shopping to obtain a supply of prescription drugs,
Thrift client authentication and doAs impersonation, introduced in HBase 1.0, provides more flexibility for your HBase installation.
In the two-part blog series “How-to: Use the HBase Thrift Interface” (Part 1 and Part 2), Jesse Anderson explained the Thrift interface in detail, and demonstrated how to use it. He didn’t cover running Thrift in a secure Apache HBase cluster, however, because there was no difference in the client configuration with the HBase releases available at that time.
Apache Spark’s ability to support data quality checks via DataFrames is progressing rapidly. This post explains the state of the art and future possibilities.
Apache Hadoop and Apache Spark make Big Data accessible and usable so we can easily find value, but that data has to be correct, first. This post will focus on this problem and how to solve it with Apache Spark 1.3 and Apache Spark 1.4 using DataFrames.
The Strata + Hadoop World NYC 2015 (Sept. 29-Oct. 3) agenda was published in the last few days. Congratulations to all accepted presenters!
In this post, I just want to provide a concise digest of the tutorials and sessions that will involve Cloudera or Intel engineers and/or interesting use cases. There are many worthy sessions from which to choose, so we hope this list will influence your decisions about where to spend your time during the week!