Tag Archives: configuration

Protecting Hadoop Clusters From Malware Attacks

Categories: Altus CDH Platform Security & Cybersecurity

Two new strains of malware–XBash and DemonBot–are targeting Apache Hadoop servers for Bitcoin mining and DDOS purposes. This malware is scanning the internet so vigorously for Hadoop clusters that an infection can occur within minutes of an insecure cluster being placed on the open internet. This blog post describes the mechanism this malware uses and offers specific actions to protect your Hadoop-based clusters.

A History of Hadoop Malware

Roughly two years ago there were a spate of attacks against the open source database solution MongoDB,

Read more

Cloudera Altus on Microsoft Azure

Categories: Altus Cloud

Cloudera Altus (launched in May 2017) is a platform-as-a-service (PaaS) offering that enables users to analyze and process data at scale in public cloud infrastructures. Altus was designed from the outset to support multiple clouds from the perspective of both back-end architecture and front-end workflows. With the announcement of Microsoft Azure support, Altus will be able to support data engineering workloads in Microsoft Azure, with the same Altus interfaces for API and CLI,

Read more

Latest Impala Cookbook

Categories: Impala

Over the past year (and through several releases), Apache Impala (incubating) has added numerous new features and performance enhancements better enabling high-performance SQL analytics over big data.  Thus, it is time again for an update to the Impala cookbook, which contains best practices for these new features, updated guidelines, and more detailed examples.

Note: This cookbook does not yet capture best practices for the major new advancements available with the recent GA of Kudu.

Read more

Untangling Apache Hadoop YARN, Part 3: Scheduler Concepts

Categories: YARN

In Parts 1 and 2, we covered the basics of YARN resource allocation. In this installment, we’ll provide an overview of cluster scheduling and introduce the Fair Scheduler, one of the scheduler choices available in YARN.

A standalone computer can have several CPU cores, each running a single process, but there can be as many as a few hundred processes running simultaneously. The scheduler is a part of the desktop’s operating system that assigns a process to a CPU core to run for a short period of time.

Read more