Apache HDFS Archives | Page 2 of 4

December 7, 2017 | Technical

Hadoop Delegation Tokens Explained

Apache Hadoop’s security was designed and implemented around 2009, and has been stabilizing since then. However, due to a lack of documentation around this area, it’s hard to understand or debug when problems arise. Delegation tokens were designed and are widely used in the Hadoop ecosystem as an authentication method. This blog post introduces the […]

by Cloudera 14 min read

November 30, 2017 | Technical

Deploy Cloudera EDH Clusters Like a Boss Revamped – Part 1

We at Cloudera believe that all companies should have the power to leverage data for financial gain, to lower operational costs, and to avoid risk. We enable this by providing an enterprise grade platform that allows customers to easily manage, store, process, and analyze all of your data, regardless of volume and variety. Cloudera’s Enterprise […]

by Benjamin Vera-Tudela , Brandon Freeman , Mladen Kovacevic 9 min read

Apache Hadoop Apache HDFS Cloudera Enterprise

August 1, 2017 | Technical

Using Amazon S3 with Cloudera BDR

More of you are moving to public cloud services for backup and disaster recovery purposes, and Cloudera has been enhancing the capabilities of Cloudera Manager and CDH to help you do that. Specifically, Cloudera Backup and Disaster Recovery (BDR) now supports backup to and restore from Amazon S3 for Cloudera Enterprise customers. BDR lets you […]

by Cloudera 6 min read

Apache HDFS Apache Hive Cloud Cloudera Enterprise Cloudera Manager

June 7, 2017 | Technical

Apache Solr Memory Tuning for Production

Configuring Apache Solr memory properly is critical for production system stability and performance. It can be hard to find the right balance between competing goals. There are also multiple factors, implicit or explicit, that need to be taken into consideration. This blog talks about some common tasks in memory tuning and guides you through the […]

by Cloudera 7 min read

Apache HDFS Cloudera Enterprise Search

June 6, 2017 | Technical

Introducing Apache HBase Medium Object Storage (MOB) compaction partition policies

Introduction The Apache HBase Medium Object Storage (MOB) feature was introduced by HBASE-11339. This feature improves low latency read and write access for moderately-sized values (ideally from 100K to 10MB based on our testing results), making it well-suited for storing documents, images, and other moderately-sized objects [1]. The Apache HBase MOB feature achieves this improvement […]

by Cloudera 5 min read

Apache Hadoop Apache HBase Apache HDFS

May 22, 2017 | Technical

HDFS Maintenance State

Introduction: System maintenance operations such as updating operating systems, and applying security patches or hotfixes are routine operations in any data center. DataNodes undergoing such maintenance operations can go offline for anywhere from a few minutes to several hours. By design, Apache Hadoop HDFS can handle DataNodes going down. However, any uncoordinated maintenance operations on […]

by Cloudera 7 min read

Apache HDFS Cloudera Enterprise

December 20, 2016 | Technical

HDFS DataNode Scanners and Disk Checker Explained

As many of us know, data in HDFS is stored in DataNodes, and HDFS can tolerate DataNode failures by replicating the same data to multiple DataNodes. But exactly what happens if some DataNodes’ disks are failing? This blog post explains how some of the background work is done on the DataNodes to help HDFS to […]

by Cloudera 9 min read

Apache Hadoop Apache HDFS Cloudera Enterprise

October 18, 2016 | Technical

How-to: Use the New HDFS Intra-DataNode Disk Balancer in Apache Hadoop

HDFS now includes (shipping in CDH 5.8.2 and later) a comprehensive storage capacity-management approach for moving data across nodes. In HDFS, the DataNode spreads the data blocks into local filesystem directories, which can be specified using dfs.datanode.data.dir in hdfs-site.xml. In a typical installation, each directory, called a volume in HDFS terminology, is on a different […]

by Cloudera 4 min read

Apache Hadoop Apache HDFS Cloudera Enterprise

February 18, 2016 | Technical

Introducing Apache Arrow: A Fast, Interoperable In-Memory Columnar Data Structure Standard

Engineers from across the Apache Hadoop community are collaborating to establish Arrow as a de-facto standard for columnar in-memory processing and interchange. Here’s how it works. Apache Arrow is an in-memory data structure specification for use by engineers building data systems. It has several key benefits: A columnar memory-layout permitting O(1) random access. The layout is highly cache-efficient in […]

by Cloudera , Todd Lipcon , Wes McKinney 4 min read

Apache HDFS Apache Impala Apache Kudu Data Science Performance

December 15, 2015 | Business

DistCp Performance Improvements in Apache Hadoop

Recent improvements to Apache Hadoop’s native backup utility, which are now shipping in CDH, make that process much faster. DistCp is a popular tool in Apache Hadoop for periodically backing up data across and within clusters. (Each run of DistCp in the backup process is referred to as a backup cycle.) Its popularity has grown […]

by Cloudera , Yongjun Zhang 7 min read

Apache Hadoop Apache HDFS Cloudera Enterprise Performance

Filter By