Apache Hadoop Versions: Looking Ahead
A few months ago, my colleague Charles Zedlewski wrote a great piece explaining Apache Hadoop version numbering. The post can be summed up with the following diagram:
While Charles’s post does a great job of explaining the history of Apache Hadoop version numbering, it doesn’t help users understand where Hadoop version numbers are headed.
Hadoop has of late been frequently referred to as “an operating system for the cloud.” Without disputing the accuracy of this description, one thing is clear: if we’re going to compete for mindshare with entrenched operating systems, we’ll need to first get our version numbers up to par. For the first 5 or so years of Hadoop releases, the version number was strictly less than “1″. With the release of Hadoop 1.0 in late 2011, and the impending release of Hadoop 2.0, we’ve done a lot to catch up, but The Community must do more.
First: Version Backdating
Similarly to option backdating, several past releases will be renumbered to higher version numbers. Starting with the Hadoop 0.16 line, through the venerable 0.20 line, the major version will be incremented, beginning at 1, and proceeding with the Fibonacci sequence. However, these releases’ minor versions will be kept the same, to decrease confusion. This will yield the following mapping of version numbers:
The releases previously known as “1.0.x” and “2.0.x” will likewise be increased, but this time will be explicitly set to the next two numbers in the Fibonacci sequence, and their minor versions will be reset to 0. So, 8.0 and 13.0, respectively.
In order to appropriately distinguish the importance of the two releases previously known as “1.0” and “2.0”, these versions will adopt a new 6-part numbering scheme, i.e. 18.104.22.168.0.0 and 22.214.171.124.0.0. The intention of the last three parts of this scheme is that they will never be anything but “0” – the extra zeros are used solely to identify the importance of the release. More zeros may be added in the future.
Note: Given the maturity of 8.x.y.0.0.0 (nee 1.0.0), this release line will henceforth only accept code modifications that are either spelling corrections, or the addition of single lines of documentation.
Second: Fractional Minor Versions
In order to further distinguish Hadoop from other “operating systems,” Hadoop releases will optionally, at the will of The Community, be able to append fractional version numbers. This should afford Hadoop with versioning flexibility unrivaled in the software industry. Examples might include “Apache Hadoop 13 ½”, “Apache Hadoop 33 ⅓”, or “Apache Hadoop 21 C/d”.
Third (and Finally): Codenames
Other “operating systems” are often referred to not merely by version numbers, but by catchy codenames. Hadoop must be able to compete along these lines as well. Though not yet voted on by the community, the following codenames have been proposed:
Apache Hadoop 8.x.y.0.0.0 shall alternately be called “Lion”
Apache Hadoop 13.x.y.0.0.0 shall alternately be called “Vista”
Apache Hadoop 21.x.y.0.0.0 shall alternately be called “Hadoop Database 11g”
These new codenames should help Hadoop spread to the so far underserved retail big data processing market, and will undoubtedly help realize Doug’s original goal of “a cluster in every home”.
Cloudera Manager Version Compatibility Layer
In order to keep pace with what will surely be a proliferation of many different versions of Apache Hadoop running within each organization, Cloudera Manager will need to introduce a compatibility and versioning layer of its own. This product, being announced for the first time today, will support a single console from which an operator can monitor and manage all past, present, re-numbered, and future versions of Apache Hadoop. This product shall be called Cloudera Middle Manager.
[NOTE: This was an April Fools Day post.]