Category Archives: CDH

How-to: Fuzzy Name Indexing in Apache Hadoop with Rosette and Cloudera Search

Categories: CDH How-to Search

In this guide, learn how to use Cloudera Search with Basis Technology’s Rosette®  to perform fuzzy name searches in multiple languages and scripts.

Our thanks to Basis Technology team (Jeanne Le Garrec, Hannah MacKenzie-Margulies and Brian Sawyer) for supporting writing this how-to blog.

Cloudera Search, powered by Apache Solr brings full-text, interactive search, and scalable indexing to Apache Hadoop by marrying SolrCloud with HDFS, Apache HBase,

Read More

How-to: Deploy a Secure Enterprise Data Hub on Microsoft Azure – Part 1

Categories: CDH Cloud Hadoop How-to Ops and DevOps Platform Security & Cybersecurity

 

Learn how to use Cloudera Director, Microsoft Active Directory (AD DS, AD CS, AD DNS), SAMBA, and SSSD to deploy a secure EDH cluster for workloads in the public cloud.

Authenticating users in Apache Hadoop is the first line of security we recommend. Like most, if not all RDBMS, a user is provided with a username and a password to validate their identity. This is a requirement to access any data managed by those systems.

Read More

HDFS DataNode Scanners and Disk Checker Explained

Categories: CDH Hadoop HDFS

As many of us know, data in HDFS is stored in DataNodes, and HDFS can tolerate DataNode failures by replicating the same data to multiple DataNodes. But exactly what happens if some DataNodes’ disks are failing? This blog post explains how some of the background work is done on the DataNodes to help HDFS to manage its data across multiple DataNodes for fault tolerance. Particularly, we will explain block scanner, volume scanner,

Read More

Resource Management for Apache Impala (incubating)

Categories: CDH Cloudera Manager Hadoop Impala Ops and DevOps Use Case

Apache Impala (incubating) includes several features that allow you to restrict or allocate resources so as to maximize stability and performance for your Impala workloads. You can limit both CPU and memory resources used by Impala to manage and prioritize jobs on CDH clusters. This blog post describes the techniques a typical Impala deployment can use to manage its resources.

Static Service Pools

Static service pools isolate services from one another, so that a high load on one service has limited impact on other services.

Read More

What’s New in Cloudera Director 2.2?

Categories: CDH Cloud Cloudera Manager Hadoop

This new release adds support for Amazon EBS volumes and the ability to diagnose cluster bootstrap errors quickly.

Cloudera Director provides a simple, reliable, enterprise-grade way to deploy, scale, and manage Apache Hadoop in the cloud of your choice. Cloudera Director enables you to deploy production-ready clusters for big data applications and successfully run workloads in the cloud.

Cloudera Director makes it easier for customers to:

  • Deploy clusters in line with patterns native to cloud infrastructure
  • Use an interface to define in one place the desired cluster specification all the way down to the operating system
  • Repeatedly and programmatically instantiate these cluster definitions
  • Adapt to the dynamic nature of cloud infrastructure

Cloudera Director 2.2 provides additional mechanisms to get that initial cluster definition right and the ability to diagnose errors and iterate quickly.

Read More