Cascading, Spring, and Spark: Development Choices for CDH Users Expand

Categories: CDH Hadoop Kite SDK

In software development, there is no substitute for having choices. Furthermore, freedom of choice – between frameworks, APIs, and languages — is a major fuel source for platform adoption across any successful ecosystem.

In the case of development on CDH, the open source core of Cloudera’s Big Data platform containing Apache Hadoop and related ecosystem projects, the choices have expanded dramatically in the past three weeks:

  • Spark + CDH

    Cloudera has announced direct support for Apache Spark (incubating) with CDH. Spark, the in-memory data processing framework designed at UC Berkeley’s AMPLab that complements MapReduce for analytic workloads (and runs on top of HDFS), is well known by developers for its highly consumable APIs – particularly for Java, Scala, and Python. (This support is occurring through the Cloudera Connect: Innovators program, in which Databricks, which is commercializing Spark, is the first partner.) So, for those exploring Spark-based in-memory processing for certain workloads (more to come on that in future posts), the range of development options is rich.

  • Cascading + CDH

    Cloudera has certified Cascading 2.2, the popular open source Java-based framework for building data pipelines in Hadoop, with CDH4. Thus community-developed Cascading offshoots for JVM languages like Scala, (Scalding), Clojure (Cascalog) and Groovy (cascading.groovy) should work with CDH4 as well.

  • Spring + CDH

    Spring for Apache Hadoop, which is bundled inside the ubiquitous open source Spring IO framework for enterprise Java development, is also now certified for use with CDH4 – making CDH4/Hadoop development accessible by the massive and mainstream Spring community through a single, high-level API.

These new options are in addition to familiar ones like Apache Crunch, the Cloudera Development Kit (CDK), and Hadoop’s native APIs – and of course, none of them are necessarily mutually exclusive, depending on the use case involved (although developers do tend to stick with their favorite toys).

It’s clear: Making CDH your platform for Hadoop application development gives you the flexibility to choose the right framework/API for the job and for your skill set or personal proclivity. That’s what a platform should do.

Justin Kestelyn is Cloudera’s developer outreach director.