The 0.2.0 release of the spark-ts package includes includes a fleshed-out Java API, among other things.
The spark-ts library, which was initially developed by Cloudera’s Data Science team, enables analysis of datasets comprising millions of time series, each with millions of measurements. It runs atop Apache Spark, and exposes Scala, Java, and Python APIs. Check out this recent post for a closer look at the library and how to use it.
Time-series analysis is becoming mainstream across multiple data-rich industries. The new spark-ts library helps analysts and data scientists focus on business questions, not on building their own algorithms.
Have you ever wanted to build models over measurements coming in every second from sensors across the world? Dig into intra-day trading prices of millions of financial instruments? Compare hourly view statistics across every page on Wikipedia? To do any of these things, you’d need to do a large sequence of measurements over time.
In the conclusion to this series, learn how resource tuning, parallelism, and data representation affect Spark job performance.
In this post, we’ll finish what we started in “How to Tune Your Apache Spark Jobs (Part 1)”. I’ll try to cover pretty much everything you could care to know about making a Spark program run fast. In particular, you’ll learn about resource tuning, or configuring Spark to take advantage of everything the cluster has to offer.
Learn techniques for tuning your Apache Spark jobs for optimal efficiency.
When you write Apache Spark code and page through the public APIs, you come across words like transformation, action, and RDD. Understanding Spark at this level is vital for writing Spark programs. Similarly, when things start to fail, or when you venture into the web UI to try to understand why your application is taking so long,
Spark 1.0 reflects a lot of hard work from a very diverse community.
Cloudera’s latest platform release, CDH 5.1, includes Apache Spark 1.0, a milestone release for the Spark project that locks down APIs for Spark’s core functionality. The release reflects the work of hundreds of contributors (including our own Diana Carroll, Mark Grover, Ted Malaska, Colin McCabe, Sean Owen, Hari Shreedharan, Marcelo Vanzin, and me).