Tag Archives: Hadoop Distributed File System

How Improved Short-Circuit Local Reads Bring Better Performance and Security to Hadoop

Categories: Hadoop HDFS

One of the key principles behind Apache Hadoop is the idea that moving computation is cheaper than moving data — we prefer to move the computation to the data whenever possible, rather than the other way around. Because of this, the Hadoop Distributed File System (HDFS) typically handles many “local reads” reads where the reader is on the same node as the data:

Initially, local reads in HDFS were handled the same way as remote reads: the client connected to the DataNode via a TCP socket and transferred the data via DataTransferProtocol.

Read more

How Rapleaf Works Smarter with Cloudera

Categories: CDH Cloudera Manager Use Case

Because raising the visibility of Apache Hadoop use cases is so important, in this post we bring you a re-posted story about how and why Rapleaf, a marketing data company based in San Francisco, uses Cloudera Enterprise (CDH and Cloudera Manager).

Founded in 2006, Rapleaf’s mission is to make it incredibly easy for marketers to access the data they need so they can personalize content for their customers.

Read more

Processing Rat Brain Neuronal Signals Using an Apache Hadoop Computing Cluster – Part I

Categories: General Guest Hadoop Hive Use Case

Introduction

In this three-part series of posts, we will share our experiences tackling a scientific computing challenge that may serve as a useful practical example for those readers considering Apache Hadoop and Apache Hive as an option to meet their growing technical and scientific computing needs. This first part describes some of the background behind our application and the advantages of Hadoop that make it an attractive framework in which to implement our solution. Part II dives into the technical details of the data we aimed to analyze and of our solution.

Read more

Apache HBase Write Path

Categories: CDH General HBase

Apache HBase is the Hadoop database, and is based on the Hadoop Distributed File System (HDFS). HBase makes it possible to randomly access and update data stored in HDFS, but files in HDFS can only be appended to and are immutable after they are created.  So you may ask, how does HBase provide low-latency reads and writes? In this blog post, we explain this by describing the write path of HBase —

Read more

NameNode Recovery Tools for the Hadoop Distributed File System

Categories: HDFS

Warning: The procedure described below can cause data loss. Contact Cloudera Support before attempting it.

Most system administrators have had to deal with a bad hard disk at some point. One moment, the hard disk is a mechanical marvel; the next, it is an expensive paperweight.

The HDFS (Hadoop Distributed File System) community has been steadily working to diminish the impact of disk failures on overall system availability. In this article,

Read more