Thanks to Holden Karau (@holdenkarau), Software Engineer at Alpine Data Labs (also a Spark contributor and book author), for providing the following post about her work on new base classes for testing Apache Spark programs.
Testing in the world of Apache Spark has often involved a lot of hand-rolled artisanal code, which frankly is a good way to ensure that developers write as few tests as possible. I’ve been doing some work with Spark Testing Base (also available on Spark Packages) to try and make testing Spark jobs as easy as “normal”
Apache Spark continues to be a major theme in the Strata + Hadoop World conference series; here are highlights at NYC next week.
Strata + Hadoop World NYC 2015 (Sept. 29-Oct. 1; if you haven’t registered yet, a 20% discount is still available) is a learning bonanza for many reasons, but this year the focus on Apache Spark and its growing importance in the Apache Hadoop ecosystem is notable.
Recent Impala testing demonstrates its scalability to a large number of concurrent users.
Impala, the open source MPP query engine designed for high-concurrency SQL over Apache Hadoop, has seen tremendous adoption across enterprises in industries such as financial services, telecom, healthcare, retail, gaming, government, and advertising. Impala has unlocked the ability to use business intelligence (BI) applications on Hadoop; these applications support critical business needs such as data discovery,
To design effective fraud-detection architecture, look no further than the human brain (with some help from Spark Streaming and Apache Kafka).
At its core, fraud detection is about detection whether people are behaving “as they should,” otherwise known as catching anomalies in a stream of events. This goal is reflected in diverse applications such as detecting credit-card fraud, flagging patients who are doctor shopping to obtain a supply of prescription drugs,
Strata + Hadoop World New York 2015 needs your developer demos! The proposal period closes on Aug. 14.
As everyone knows, Apache Hadoop’s overwhelming success is partly premised on de-centralized innovation from all corners of the community—users, vendors, and academia—with everyone participating on a level playing field. And since 2011, Strata + Hadoop World has been a community and content hub of that ecosystem.
For the 2015 show in New York (Sept.