Aki Ariga, Author at Cloudera Blog

September 20, 2017 | Technical

How to Distribute your R code with sparklyr and Cloudera Data Science Workbench

sparklyr is a great opportunity for R users to leverage the distributed computation power of Apache Spark without a lot of additional learning. sparklyr acts as the backend of dplyr so that R users can write almost the same code for both local and distributed calculation over Spark SQL. Since sparklyr v0.6, we can run […]

by Aki Ariga 6 min read

May 15, 2017 | Technical

Create conda recipe to use C extended Python library on PySpark cluster with Cloudera Data Science Workbench

Cloudera Data Science Workbench provides data scientists with secure access to enterprise data with Python, R, and Scala. In the previous article, we introduced how to use your favorite Python libraries on an Apache Spark cluster with PySpark. In Python world, data scientists often want to use Python libraries, such as XGBoost, which includes C/C++ […]

by Aki Ariga 3 min read

Apache Spark Cloudera Data Science Workbench Cloudera Enterprise

More by this author:

How to Distribute your R code with sparklyr and Cloudera Data Science Workbench

Create conda recipe to use C extended Python library on PySpark cluster with Cloudera Data Science Workbench