Tag Archives: events

Designing Fraud-Detection Architecture That Works Like Your Brain Does

Categories: Flume HBase Kafka Spark Use Case

To design effective fraud-detection architecture, look no further than the human brain (with some help from Spark Streaming and Apache Kafka).

At its core, fraud detection is about detection whether people are behaving “as they should,” otherwise known as catching anomalies in a stream of events. This goal is reflected in diverse applications such as detecting credit-card fraud, flagging patients who are doctor shopping to obtain a supply of prescription drugs,

Read more

Deploying Apache Kafka: A Practical FAQ

Categories: CDH Kafka

This post contains answers to common questions about deploying and configuring Apache Kafka as part of a Cloudera-powered enterprise data hub.

Cloudera added support for Apache Kafka, the open standard for streaming data, in February 2015 after its brief incubation period in Cloudera Labs. Apache Kafka now is an integrated part of CDH, manageable via Cloudera Manager, and we are witnessing rapid adoption of Kafka across our customer base.

Read more

Architectural Patterns for Near Real-Time Data Processing with Apache Hadoop

Categories: Data Ingestion Flume Hadoop HBase Kafka Spark

Evaluating which streaming architectural pattern is the best match to your use case is a precondition for a successful production deployment.

The Apache Hadoop ecosystem has become a preferred platform for enterprises seeking to process and understand large-scale data in real time. Technologies like Apache Kafka, Apache Flume, Apache Spark, Apache Storm, and Apache Samza are increasingly pushing the envelope on what is possible. It is often tempting to bucket large-scale streaming use cases together but in reality they tend to break down into a few different architectural patterns,

Read more

Sneak Preview: HBaseCon 2015 Use Cases Track

Categories: Community Events HBase

This year’s HBaseCon Use Cases track includes war stories about some of the world’s best examples of running Apache HBase in production.

As a final sneak preview leading up to the show next week, in this post, I’ll give you a window into the HBaseCon 2015’s (May 7 in San Francisco) Use Cases track.

hbasecon logo

Thanks, Program Committee!

  • “HBase @ Flipboard”

Read more

Using Apache Parquet at AppNexus

Categories: Guest Impala Parquet Performance

Thanks to Chen Song, Data Team Lead at AppNexus, for allowing us to republish the following post about his company’s use case for Apache Parquet (incubating at this writing), the open standard for columnar storage across the Apache Hadoop ecosystem.

At AppNexus, over 2MM log events are ingested into our data pipeline every second. Log records are sent from upstream systems in the form of Protobuf messages. Raw logs are compressed in Snappy when stored on HDFS.

Read more