Author Archives: Brad Barker

How-to: Train Models in R and Python using Apache Spark MLlib and H2O

Categories: Data Science How-to Spark

Creating and training machine-learning models is more complex on distributed systems, but there are lots of frameworks for abstracting that complexity.

There are more options now than ever from proven open source projects for doing distributed analytics, with Python and R become increasingly popular. In this post, you’ll learn the options for setting up a simple read-eval-print (REPL) environment with Python and R within the Cloudera QuickStart VM using APIs for two of the most popular cluster computing frameworks: Apache Spark (with MLlib) and H2O (from the company with the same name).

Read more