I am very pleased to announce the general availability of Cloudera’s Distribution including Apache Hadoop, version 3. We’ve been working on this release for more than a year — our initial beta release was on March 24 of 2010, and we’ve made a number of enhancements to the software in the intervening months. This release is the culmination of that long process. It includes the hard work of the broad Apache Hadoop community and the entire team here at Cloudera.
We’ve done three things in this release that I’m particularly proud of.
First, we’ve produced what we believe the community and the industry need: A complete Hadoop-based stack for data storage and analysis.
Apache Hadoop is a tremendously powerful piece of technology. It provides a suite of data processing services that literally have no parallel (heh!) among commercial products in the industry. Out of the box, however, the project lacks key features that we’ve learned are necessary in our two and a half years of working with customers.
CDH3 adds those features by including complementary open source packages. Flume and Sqoop provide data loading and integration services. Apache Hive and Apache Pig offer high-level query language interfaces to your data. Apache HBase provides fast record-based fetch and storage services. Oozie delivers job scheduling and workflow management. Hue is an easy-to-use UI framework for applications. Apache Zookeeper provides synchronization and coordination across the components. CDH3 integrates all of these into a single artifact — the right versions, with the right bug fixes and features, tested together and packaged for easy installation.
As a result, right out of the box, you can get useful work done on Hadoop.
Second, we are living up to a long-term strategic commitment here at Cloudera. CDH3 is one hundred percent open source. That’s been true for every version we’ve ever shipped, and will be true for all future versions, too. The community collaborates best and innovates fastest when we all share our work freely. Enterprise users have learned the hard way that infrastructure software from proprietary vendors is a problem: It gets steadily more expensive over time, and the customer is beholden to the vendor forever for access to data and for the right to run analytics. Open source software is good for us and for our customers. No user of CDH will ever be locked into a single vendor — not Cloudera, and not anyone else.
Third, CDH3 is the only Hadoop-based package available anywhere that has been deployed by thousands of enterprises.
New infrastructure always poses risks. How mature is the platform? How reliable is it, and how easy to install, deploy and manage? CDH3 is a safe choice. It lets you build on the success of Cloudera’s world-wide installed base. Our users run it on systems ranging from small clusters managing just a few terabytes all the way up to petabyte-class systems running on more than a thousand nodes. In markets as diverse as financial services, telecommunications, retail, consumer goods, online services and government, CDH3 is tackling real problems, on deadline, every day.
If you’d like more detail on what’s included in CDH3, we’d be glad to have you attend a webinar we’re hosting on April 21st at 11am Pacific time. Charles Zedlewski, Cloudera’s VP Product, will explain in more depth what’s in the package and how it works.
Or, of course, you can just go download the software.