Category Archives: Use Case

How-to: Ingest Email into Apache Hadoop in Real Time for Analysis

Categories: Data Ingestion Flume Hadoop Kafka Search Spark Use Case

Apache Hadoop is a proven platform for long-term storage and archiving of structured and unstructured data. Related ecosystem tools, such as Apache Flume and Apache Sqoop, allow users to easily ingest structured and semi-structured data without requiring the creation of custom code. Unstructured data, however, is a more challenging subset of data that typically lends itself to batch-ingestion methods. Although such methods are suitable for many use cases,

Read More

How-to: Detect and Report Web-Traffic Anomalies in Near Real-Time

Categories: CDH Flume Impala Spark Use Case

This framework based on Apache Flume, Apache Spark Streaming, and Apache Impala (incubating) can detect and report on abnormal bad HTTP requests within seconds.                     

Website performance and availability are mission-critical for companies of all types and sizes, not just those with a revenue stream directly tied to the web. Web pages can become unavailable for many reasons, including overburdened backing data stores or content-management systems or a delay in load times of third-party content such as advertisements.

Read More

How-to: Analyze Fantasy Sports with Apache Spark and SQL (Part 2: Data Exploration)

Categories: Hadoop Spark Use Case

Learn how analyzing stats from professional sports leagues is an instructive use case for data analytics using Apache Spark with SQL. Covered in this installment: data exploration with Apache Impala (incubating) and Hue.

In Part 1 of this series, I introduced the topic of using fantasy sports analytics as an instructive use case for exploring the Apache Hadoop ecosystem. In that installment, we focused on data processing by taking a collection of data from Basketball-Reference.com and enriching it with z-scores and normalized z-scores to analyze the relative value of NBA players.

Read More

How-to: Analyze Fantasy Sports using Apache Spark and SQL

Categories: Hive How-to Impala Spark Use Case

As part of the drumbeat for Spark Summit West in San Francisco (June 6-8),  learn how analyzing stats from professional sports leagues is an instructive use case for data analytics using Apache Spark with SQL.

In the United States, many diehard sports fans morph into amateur statisticians to get an edge over the competition in their fantasy sports leagues. Depending on one’s technical chops, this “edge” is usually no more sophisticated than simple spreadsheet analysis,

Read More

How-to: Process and Index Medical Images with Apache Hadoop and Apache Solr

Categories: CDH Guest Search Use Case ZooKeeper

Thanks to Karthik Vadla, Abhi Basu, and Monica Martinez-Canales of Intel Corp. for the following guest post about using CDH for cost-effective processing/indexing of DICOM (medical) images.

Medical imaging has rapidly become the best non-invasive method to evaluate a patient and determine whether a medical condition exists. Imaging is used to assist in the diagnosis of a condition and, in most cases, is the first step of the journey through the modern medical system.

Read More