With an ever-increasing number of IoT use cases on the CDH platform, security for such workloads is of paramount importance. This blog post describes how one can consume data from Kafka in Spark, two critical components for IoT use cases, in a secure manner.
The Cloudera Distribution of Apache Kafka 2.0.0 (based on Apache Kafka 0.9.0) introduced a new Kafka consumer API that allowed consumers to read data from a secure Kafka cluster.
Cloudera Data Science Workbench provides data scientists with secure access to enterprise data with Python, R, and Scala. In the previous article, we introduced how to use your favorite Python libraries on an Apache Spark cluster with PySpark. In Python world, data scientists often want to use Python libraries, such as XGBoost, which includes C/C++ extension. This post shows how to solve this problem creating a conda recipe with C extension.
Self-service business intelligence and exploratory analytics continue to be a primary use case for Cloudera’s customers. Over the past year, we have made a number of significant advancements in Hue, the intelligent SQL editor, to provide a more powerful user experience for SQL developers and make them even more productive for those use cases.
The recent release of Cloudera 5.11 furthers this effort with new enhancements around embedded search and tagging for faster data discovery,
Last week, Cloudera announced the General Availability release of Cloudera Data Science Workbench. In this post, I’ll give a brief overview of its capabilities and architecture, along with a quick-start guide to connecting Cloudera Data Science Workbench to your existing CDH cluster in three simple steps.
At its core, Cloudera Data Science Workbench enables self-service data science for the enterprise. Data scientists can build, scale, and deploy data science and machine learning solutions in a fraction of the time,
Recently we worked with a customer that needed to run a very significant amount of models in a given day to satisfy internal and government regulated risk requirements. Several thousand model executions would need to be supported per hour. Total execution time was very important to this client. In the past the customer used thousands of servers to meet the demand. They need to run many derivations of this model with different economic factors to satisfy their requirements.