Testing Apache Kudu Applications on the JVM

Categories: Kudu Testing

Although the Kudu server is written in C++ for performance and efficiency, developers can write client applications in C++, Java, or Python. To make it easier for Java developers to create reliable client applications, we’ve added new utilities in Kudu 1.9.0 that allow you to write tests using a Kudu cluster without needing to build Kudu yourself, without any knowledge of C++, and without any complicated coordination around starting and stopping Kudu clusters for each test.

Read more

Transparent Hierarchical Storage Management with Apache Kudu and Impala

Categories: CDH Impala Kudu Parquet

When picking a storage option for an application it is common to pick a single storage option which has the most applicable features to your use case. For mutability and real-time analytics workloads you may want to use Apache Kudu, but for massive scalability at a low cost you may want to use HDFS. For that reason, there is a need for a solution that allows you to leverage the best features of multiple storage options.

Read more

SMM 1.2 Released with Powerful New Alerting and Topic Lifecycle Management Features with Schema Registry Integration

Categories: Kafka Tools

[Editor’s note: Now that the recent merger is complete, the Cloudera Engineering blog will expand to cover products, such as this, originally developed for the Hortonworks platform. Please stay tuned for future product announcements regarding availability of these products on the Cloudera platform.]

Since the release of Streams Messaging Manager (SMM) at the end of last summer, our customers have started to cure the Kafka Blindness within their organizations by using SMM to monitor their Kafka clusters and streaming microservices applications.

Read more

Using Native Math Libraries to Accelerate Spark Machine Learning Applications

Categories: AI and Machine Learning CDH Performance Spark

[Editor’s note: The original version of this article was published as part of our Guru How-To series for Data Science. Be sure to also check out the series for Cloudera Data Warehouse.]

 

Spark ML is one of the dominant frameworks for many major machine learning algorithms, such as the Alternating Least Squares (ALS) algorithm for recommendation systems, the Principal Component Analysis algorithm, and the Random Forest algorithm.

Read more

Integrating Machine Learning Models into Your Big Data Pipelines in Real-Time With No Coding

Categories: AI and Machine Learning CDH Cloudera Data Science Workbench How-to

[Editor’s note: This article was originally published on the Hortonworks Community Connection, but reproduced here because CDSW is now available on both Cloudera and Hortonworks platforms.]

Using Deployed Models as a Function as a Service

104409 dataengineering 104410 datascience 104431 flowmanagement

Using Cloudera Data Science Workbench with Apache NiFi, we can easily call functions within our deployed models from Apache NiFi as part of flows. I am working against CDSW on HDP (https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_hdp.html), 

Read more