Wednesday, November 9th, 2011
Hadoop is an excellent environment for analyzing large data sets, but it lacks an easy-to-use graphical interface for building data pipelines and performing advanced analytics. RapidMiner is an excellent open-source tool for data analytics, but is limited to running on a single machine.In this presentation, we will introduce Radoop, an extension to RapidMiner that lets users interact with a Hadoop cluster. Radoop combines the strengths of both projects and provides a user-friendly interface for editing and running ETL, analytics, and machine learning processes on Hadoop. We will also discuss lessons learned while integrating HDFS, Hive, and Mahout with RapidMiner.