Author Archives: Tom White

Apache Hadoop 2.0.3-alpha Released

Categories: General Hadoop HDFS MapReduce YARN

Last week the Apache Hadoop PMC voted to release Apache Hadoop 2.0.3-alpha, the latest in the Hadoop 2 release series. This release fixes over 500 issues (covering the Common, HDFS, MapReduce and YARN sub-projects) since the 2.0.2-alpha release in October last year. In addition to bug fixes and general improvements the more noteworthy changes include:

  • HDFS High Availability (HA) can now use a Quorum Journal Manager (QJM) for sharing namenode edit logs (HDFS-3077).

Read More

Apache Hadoop 2.0.2-alpha Released

Categories: Community Hadoop MapReduce

Earlier this month the Apache Hadoop PMC released Apache Hadoop 2.0.2-alpha, which fixes over 600 issues since the previous release in the 2.0 series, 2.0.1-alpha, back in July. This is a tremendous rate of development, of which all contributors to the project should feel proud.

Some of the more noteworthy changes in this release include:

  • HDFS HA supports automatic failover using ZooKeeper (HDFS-3042).

Read More

Apache Hadoop 0.23.0 has been released

Categories: General Hadoop

The Apache Hadoop PMC has voted to release Apache Hadoop 0.23.0. This release is significant since it is the first major release of Hadoop in over a year, and incorporates many new features and improvements over the 0.20 release series. The biggest new features are HDFS federation, and a new MapReduce framework. There is also a new build system (Maven), Kerberos HTTP SPNEGO support, as well as some significant performance improvements which we’ll be covering in future posts.

Read More

Snappy and Hadoop

Categories: Community General Hadoop

Snappy is a compression library developed at Google, and, like many technologies that come from Google, Snappy was designed to be fast. The trade off is that the compression ratio is not as high as other compression libraries. From the Snappy homepage:

… compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger.

Read More

Hadoop: The Definitive Guide, Second Edition

Categories: General

The second edition of my book “Hadoop: The Definitive Guide”, published by O’Reilly, is now available. The first edition was launched at the Hadoop Summit in June 2009, and has gone on to sell well. Less than a year later I was asked to write the second edition. The Hadoop ecosystem has been growing fast (and continues to), and the bulk of the extra 100 pages in the second edition are devoted to three new projects: Hive,

Read More