Category Archives: Hadoop

Understanding HDFS Recovery Processes (Part 1)

Categories: Hadoop HDFS

Having a good grasp of HDFS recovery processes is important when running or moving toward production-ready Apache Hadoop.

An important design requirement of HDFS is to ensure continuous and correct operations to support production deployments. One particularly complex area is ensuring correctness of writes to HDFS in the presence of network and node failures, where the lease recovery, block recovery, and pipeline recovery processes come into play. Understanding when and why these recovery processes are called,

Read More

Couchdoop: Couchbase Meets Apache Hadoop

Categories: Guest Hadoop

Thanks to Călin-Andrei Burloiu, Big Data Engineer at antivirus company Avira, and Radu Pastia, Senior Software Developer in the Big Data Team at Orange, for the guest post below about the Couchdoop connector for bringing Couchbase data into Hadoop.

Couchdoop is a Couchbase connector for Apache Hadoop, developed by Avira on CDH, that allows for easy, parallel data transfer between Couchbase and Hadoop storage engines. It includes a command-line tool,

Read More

This Month in the Ecosystem (January 2015)

Categories: Community Hadoop

Welcome to our 16th edition of “This Month in the Ecosystem,” a digest of highlights from January 2015 (never intended to be comprehensive; for that, see the excellent Hadoop Weekly). 

You may have noticed that this report went on hiatus for December 2014 due to a lack of critical news mass (plus, we realize that most of you are out of the loop until mid-January).

Read More

Tutorials at Strata + Hadoop World San Jose: Architecture, Hadoop Ops, Interactive SQL-on-Hadoop

Categories: Events Hadoop Impala

Strata + Hadoop World San Jose 2015 (Feb. 17-20) is a focal point for learning about production-izing Hadoop.

Strata + Hadoop World sessions have always been indispensable for learning about Hadoop internals, use cases, and admin best practices. When deep learning is needed, however—and deep dives are a necessity if you’re running Hadoop in production, or aspire to—tutorials are your ticket.

This year, tutorials span a range of topics that are central in today’s Hadoop conversation,

Read More

How-to: Deploy Apache Hadoop Clusters Like a Boss

Categories: Hadoop Hardware How-to

Learn how to set up a Hadoop cluster in a way that maximizes successful production-ization of Hadoop and minimizes ongoing, long-term adjustments.

Previously, we published some recommendations on selecting new hardware for Apache Hadoop deployments. That post covered some important ideas regarding cluster planning and deployment such as workload profiling and general recommendations for CPU, disk, and memory allocations. In this post, we’ll provide some best practices and guidelines for the next part of the implementation process: configuring the machines once they arrive.

Read More