Tag Archives: events

Cloudera Enterprise 5.5 is Now Generally Available

Categories: CDH Cloudera Manager

Cloudera Enterprise 5.5 (comprising CDH 5.5, Cloudera Manager 5.5, and Cloudera Navigator 2.4) has been released.

Cloudera is excited to bring you news of Cloudera Enterprise 5.5. Our persistent emphasis on quality is especially pronounced in this release, with more than 500 issues identified and triaged during its development.

A highlight of this release is the inclusion of Cloudera Navigator Optimizer (available in limited beta for select Cloudera Enterprise customers;

Read more

How-to: Build a Complex Event Processing App on Apache Spark and Drools

Categories: HBase How-to Kafka Spark Use Case

Combining CDH with a business execution engine can serve as a solid foundation for complex event processing on big data.

Event processing involves tracking and analyzing streams of data from events to support better insight and decision making. With the recent explosion in data volume and diversity of data sources, this goal can be quite challenging for architects to achieve.

Complex event processing (CEP) is a type of event processing that combines data from multiple sources to identify patterns and complex relationships across various events.

Read more

How-to: Index Scanned PDFs at Scale Using Fewer Than 50 Lines of Code

Categories: HBase How-to Search Spark

Learn how to use OCR tools, Apache Spark, and other Apache Hadoop components to process PDF images at scale.

Optical character recognition (OCR) technologies have advanced significantly over the last 20 years. However, during that time, there has been little or no effort to marry OCR with distributed architectures such as Apache Hadoop to process large numbers of images in near-real time.

In this post, you will learn how to use standard open source tools along with Hadoop components such as Apache Spark,

Read more

Inside Santander’s Near Real-Time Data Ingest Architecture

Categories: Flume HBase Kafka

Learn about the near real-time data ingest architecture for transforming and enriching data streams using Apache Flume, Apache Kafka, and RocksDB at Santander UK.

Cloudera Professional Services has been working with Santander UK to build a near real-time (NRT) transactional analytics system on Apache Hadoop. The objective is to capture, transform, enrich, count, and store a transaction within a few seconds of a card purchase taking place. The system receives the bank’s retail customer card transactions and calculates the associated trend information aggregated by account holder and over a number of dimensions and taxonomies.

Read more