This new feature gives Hadoop admins the commonplace ability to replace failed DataNode drives without unscheduled downtime.
Hot swapping—the process of replacing system components without shutting down the system—is a common and important operation in modern, production-ready systems. Because disk failures are common in data centers, the ability to hot-swap hard drives is a supported feature in hardware and server operating systems such as Linux and Windows Server,
Apache Hadoop ecosystem, time to celebrate! The much-anticipated, significantly updated 4th edition of Tom White’s classic O’Reilly Media book, Hadoop: The Definitive Guide, is now available.
The Hadoop ecosystem has changed a lot since the 3rd edition. How are those changes reflected in the new edition?
The core of the book is about the core Apache Hadoop project, and since the 3rd edition,
Thanks to Big Data Solutions Architect Matthieu Lieber for allowing us to republish the post below.
A customer of mine wants to take advantage of both worlds: work with his existing Apache Avro data, with all of the advantages that it confers, but take advantage of the predicate push-down features that Parquet provides. How to reconcile the two?
For more information about combining these formats,
Having a good grasp of HDFS recovery processes is important when running or moving toward production-ready Apache Hadoop. In the conclusion to this two-part post, pipeline recovery is explained.
An important design requirement of HDFS is to ensure continuous and correct operations that support production deployments. For that reason, it’s important for operators to understand how HDFS recovery processes work. In Part 1 of this post, we looked at lease recovery and block recovery.
Having a good grasp of HDFS recovery processes is important when running or moving toward production-ready Apache Hadoop.
An important design requirement of HDFS is to ensure continuous and correct operations to support production deployments. One particularly complex area is ensuring correctness of writes to HDFS in the presence of network and node failures, where the lease recovery, block recovery, and pipeline recovery processes come into play. Understanding when and why these recovery processes are called,