CDH3 and Cloudera Enterprise

Today’s a big day for us at Cloudera. We’re announcing, as part of our activity at Hadoop Summit, two major new releases that we believe substantially advance Apache Hadoop for both the open source community and our enterprise customers.

First, we’re announcing a new release of Cloudera’s Distribution for Hadoop – CDH3 Beta 2. This release, built on more than a year and a half of extensive engagement with real customers in the market, is the most comprehensive, capable and usable package available. Of course the Apache Hadoop project is the heart of our distribution, but we’ve added eight other open source packages that provide critical infrastructure and tools that are required to use Hadoop effectively in production.

The additional packages include HBase, the popular distributed columnar storage system with fast read-write access to data managed by HDFS, Hive and Pig for query access to data stored in a Hadoop cluster, Apache Zookeeper for distributed process coordination and Sqoop for moving data between Hadoop and relational database systems. We’ve adopted the outstanding workflow engine out of Yahoo!, Oozie, and have made contributions of our own to adapt it for widespread use by general enterprise customers. We’ve also released – this is a big deal, and I’m really pleased to announce it – our continuous data loading system, Flume, and our Hadoop User Environment software (formerly Cloudera Desktop, and henceforth “Hue”) under the Apache Software License, version 2.

Flume provides continuous, high-performance, reliable data loading and data flow monitoring from feeds, logging systems and other sources into a Hadoop cluster. Hue lets developers build attractive, easy-to-use Hadoop applications by providing a desktop-based user interface SDK. These two new open source projects have been under development at Cloudera, and in use by our customers, for more than a year. They’re now available to the community at large.

At the heart of the distribution is the 0.20 release of Apache Hadoop. As in previous versions of CDH, we’ve added important patches from the community and bug fixes critical for enterprise deployment. We’ve integrated the correct, stable versions of all of the projects, tested them together at scale and made the package easy to acquire, install, configure and run.

There’s no other distribution available that is as comprehensive, complete or usable as is CDH3. We’re very proud to make the innovative work of the global Hadoop development community easy for everyone to use. The entire package is open source, distributed under ASLv2 and freely available for download, use and redistribution from Cloudera’s web site.

In addition to the release of CDH3 and our new open source projects, we’re announcing today the general availability of Cloudera Enterprise.

Cloudera Enterprise combines the open source CDH3 platform with critical monitoring, management and administrative tools that our enterprise customers have told us they need to put Hadoop into production. We’ve added dashboards for critical IT tasks, including monitoring cluster status and activity, keeping track of data flows into Hadoop in real time based on the services that Flume provides, and controlling access to data and resources by users and groups. We’ve integrated access controls with Active Directory and other LDAP implementations so that IT staff can control rights and identities in the same way as they do for other business platforms they use. Cloudera Enterprise is available by annual subscription and includes maintenance, updates and support.

In case it’s not clear, yet: Cloudera is all in on Apache Hadoop. We believe that the mission-critical infrastructure at the base of business systems must be open source, these days. Enterprises simply aren’t adopting new proprietary technologies at scale at the heart of their operations. Older proprietary companies have established markets and installed bases and will survive for a long time, but a sensible customer will refuse to be tied to a single new vendor for core IT nowadays. A smart entrepreneur wouldn’t use a decades-old playbook in creating a platform company today.

CDH3 reflects that conviction. We’re convinced that open source licensing is critical to drive widespread adoption of our comprehensive Hadoop-based distribution. We’re convinced that Cloudera, with the creator of Apache Hadoop, committers and contributors across the breadth of necessary projects, experienced support professionals and consultants, and a world-class technical and business team, is ideally positioned to drive enterprise adoption of the platform.

Cloudera Enterprise allows companies that rely on Hadoop to get up and running faster. It lets them meet more stringent SLAs, reduce administrative costs and eliminate risks by making their systems more transparent and easier to maintain. It codifies the lessons that our company and our people have learned since 2006, designing, building and running Hadoop in production. We have real customers across many vertical markets in production on clusters ranging from terabytes to petabytes, solving a broad range of important business problems. If you’d like some of that, by the way, our software, our services team and our support staff are at your disposal. Give us a call.

Of course, you shouldn’t simply take my word for it. Our announcements on Cloudera’s Distribution for Hadoop and Cloudera Enterprise today include quotes from some of our customers (eBay), from industry experts and from partners. We’ve built a tremendous partner network – hardware vendors, Hadoop analytics and tools companies, database vendors and others who are committed to CDHv3 and our vision for this critical new business platform. We’re working hard to expand the ecosystem to help other companies integrate their products with Hadoop.

We’re pretty excited.

Filed under:

1 Response

Leave a comment


× seven = 35