Tag Archives: HDFS

A Look at ADLS Performance – Throughput and Scalability

Categories: CDH Cloud Hadoop HDFS Performance

Overview

Azure Data Lake Store (ADLS) is a highly scalable cloud-based data store that is designed  for collecting, storing and analyzing large amounts of data, and is ideal for enterprise-grade applications.  Data can originate from almost any source, such as Internet applications and mobile devices; it is stored securely and durably, while being highly available in any geographic region.  ADLS is performance-tuned for big data analytics and can be easily accessed from many components of the Apache Hadoop ecosystem,

Read more

implyr: R Interface for Apache Impala

Categories: CDH Data Science HBase HDFS Impala Kudu Tools

New R package implyr enables R users to query Impala using dplyr.

Apache Impala (incubating) enables low-latency interactive SQL queries on data stored in HDFS, Amazon S3, Apache Kudu, and Apache HBase. With the availability of the R package implyr on CRAN and GitHub, it’s now possible to query Impala from R using the popular package dplyr.

dplyr provides a grammar of data manipulation,

Read more

Introducing Apache HBase Medium Object Storage (MOB) compaction partition policies

Categories: HBase

Introduction

The Apache HBase Medium Object Storage (MOB) feature was introduced by HBASE-11339. This feature improves low latency read and write access for moderately-sized values (ideally from 100K to 10MB based on our testing results), making it well-suited for storing documents, images, and other moderately-sized objects [1]. The Apache HBase MOB feature achieves this improvement by separating IO paths for file references and MOB objects, applying different compaction policies to MOBs and thus reducing write amplification created by HBase’s compactions.

Read more

Cloudera Enterprise 5.5 is Now Generally Available

Categories: CDH Cloudera Manager

Cloudera Enterprise 5.5 (comprising CDH 5.5, Cloudera Manager 5.5, and Cloudera Navigator 2.4) has been released.

Cloudera is excited to bring you news of Cloudera Enterprise 5.5. Our persistent emphasis on quality is especially pronounced in this release, with more than 500 issues identified and triaged during its development.

A highlight of this release is the inclusion of Cloudera Navigator Optimizer (available in limited beta for select Cloudera Enterprise customers;

Read more