System maintenance operations such as updating operating systems, and applying security patches or hotfixes are routine operations in any data center. DataNodes undergoing such maintenance operations can go offline for anywhere from a few minutes to several hours. By design, Apache Hadoop HDFS can handle DataNodes going down. However, any uncoordinated maintenance operations on several DataNodes at the same time could lead to temporary data availability issues. HDFS currently supports the following features for performing planned maintenance activity:
- Rolling Upgrade
- HDFS supports using Maintenance State (Starting with CDH 5.11)
The rolling upgrade process helps to upgrade the cluster software without taking the cluster offline.
With an ever-increasing number of IoT use cases on the CDH platform, security for such workloads is of paramount importance. This blog post describes how one can consume data from Kafka in Spark, two critical components for IoT use cases, in a secure manner.
The Cloudera Distribution of Apache Kafka 2.0.0 (based on Apache Kafka 0.9.0) introduced a new Kafka consumer API that allowed consumers to read data from a secure Kafka cluster.
Cloudera Data Science Workbench provides data scientists with secure access to enterprise data with Python, R, and Scala. In the previous article, we introduced how to use your favorite Python libraries on an Apache Spark cluster with PySpark. In Python world, data scientists often want to use Python libraries, such as XGBoost, which includes C/C++ extension. This post shows how to solve this problem creating a conda recipe with C extension.
Cloudera Search (that is Apache Solr integrated with the Apache Hadoop eco-system) now supports (as of C5.9) a backup and disaster recovery capability for Solr collections.
In this post we will cover the basics of the backup and disaster recovery capability in Solr and hence in Cloudera Search. In the next post we will cover the design of the Solr snapshots functionality and its integration with the Hadoop ecosystem as well as public cloud platforms (e.g.
Self-service business intelligence and exploratory analytics continue to be a primary use case for Cloudera’s customers. Over the past year, we have made a number of significant advancements in Hue, the intelligent SQL editor, to provide a more powerful user experience for SQL developers and make them even more productive for those use cases.
The recent release of Cloudera 5.11 furthers this effort with new enhancements around embedded search and tagging for faster data discovery,