Livy, which streamlines Spark architecture for web/mobile apps, is the newest addition to Cloudera Labs.
With respect to the impact of Apache Spark on the Apache Hadoop ecosystem, its virtual overnight adoption as the default data processing engine—and as a standard for powering advanced analytic applications—speaks for itself. But, that’s not to say that there isn’t work yet to be done, particularly in the areas of performance at scale/under multi-tenancy, developer productivity, and extensibility.
For example, architectural options for Spark-based applications have been limited by, among other things, a lack of direct access to Spark resources by remote applications without cumbersome and repetitive client configuration, making for a fairly poor developer experience and a longer path to production. (This lack of options also limits use cases by making integration with other systems more difficult.) For those reasons, we are happy to announce that the Livy project is joining Cloudera Labs. (Like other Labs projects, Livy is intended for development and testing purposes only, not for production deployments, and is not currently supported by Cloudera.)
Livy (Apache License) is a service that enables remote apps to easily interact with a Spark cluster over a REST API. It enables easy submission of Spark jobs or snippets of Spark code, synchronous or asynchronous result retrieval, and
SparkContext management, all via a simple REST interface or an RPC client library. Livy also simplifies interaction between Spark and application servers, streamlining the architecture needed for interactive web/mobile applications.
With Livy, you can also:
- Use long-running
SparkContexts for multiple Spark jobs, by multiple clients
- Manage multiple
SparkContexts simultaneously, and run them on the cluster (YARN/Mesos) instead of the Livy Server for good fault tolerance and concurrency
- Submit jobs as precompiled jars, snippets of code, or via Java/Scala client API
- Ensure security via secure authenticated communication (work in progress)
To use Livy, Spark must be installed on your server (1.4 or later; Scala 2.10 builds). To get started, download/install packages from here and then just do the following:
- Export these variables:
12export SPARK_HOME=/usr/lib/sparkexport HADOOP_CONF_DIR=/etc/hadoop/conf
- Start the server:
As for configuration, as you can see above, Livy re-purposes your Spark config under
SPARK_HOME by default. (It is strongly recommended to configure Spark to submit applications in YARN cluster mode, to ensure that user sessions have their resources properly accounted for in the YARN cluster and that the host running the Livy server doesn’t get overloaded when multiple user sessions are running.) Livy also uses a few of its own configuration files, which by default are located in the conf directory. (See the docs for details.)
To date, employees from Cloudera, Microsoft, and Intel have contributed to Livy, and the Livy community is looking for more/other contributors. We encourage you to try it out, and we welcome any and all feedback about your Livy experience via the Cloudera Labs discussion board!
Kostas Sakellis is an Engineering Manager at Cloudera.
Anand Iyer is a Director of Product Management at Cloudera.