Author Archives: Jordan Volz and Stefan Salandy

How-to: Ingest Email into Apache Hadoop in Real Time for Analysis

Categories: Data Ingestion Flume Hadoop Kafka Search Spark Use Case

Apache Hadoop is a proven platform for long-term storage and archiving of structured and unstructured data. Related ecosystem tools, such as Apache Flume and Apache Sqoop, allow users to easily ingest structured and semi-structured data without requiring the creation of custom code. Unstructured data, however, is a more challenging subset of data that typically lends itself to batch-ingestion methods. Although such methods are suitable for many use cases,

Read More