When picking a storage option for an application it is common to pick a single storage option which has the most applicable features to your use case. For mutability and real-time analytics workloads you may want to use Apache Kudu, but for massive scalability at a low cost you may want to use HDFS. For that reason, there is a need for a solution that allows you to leverage the best features of multiple storage options.
Spark ML is one of the dominant frameworks for many major machine learning algorithms, such as the Alternating Least Squares (ALS) algorithm for recommendation systems, the Principal Component Analysis algorithm, and the Random Forest algorithm.
[Editor’s note: This article was originally published on the Hortonworks Community Connection, but reproduced here because CDSW is now available on both Cloudera and Hortonworks platforms.]
Using Deployed Models as a Function as a Service
Using Cloudera Data Science Workbench with Apache NiFi, we can easily call functions within our deployed models from Apache NiFi as part of flows. I am working against CDSW on HDP (https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_hdp.html),
Cloudera Data Warehouse offers a powerful combination of flexibility and cost-savings. Using Cloudera Data Warehouse, you can transform and optimize your current traditional data warehouse by moving select workloads to your CDH cluster. This article shows you how to transform your current setup into a modern data warehouse by moving some initial data over to Impala on your CDH cluster.
To use the following data import scenario,
We have significantly improved Impala in CDH 5.15.0 to address some of the scalability bottlenecks in query execution. 64 concurrent streams of TPC-DS queries at 10TB scale in a 135-node cluster now run at 6x query throughput compared to previous releases. In addition to running faster, the query success rate also improved from 73% to 100%. Overall, Impala in CDH 5.15.0 provides massive improvements in throughput and reliability while reducing the resource usage significantly.