Cloudera Developer Blog · CDH Posts
This is a great day for technical end-users – developers, admins, analysts, and data scientists alike. Starting now, Cloudera complements its traditional mailing lists with a new, feature-rich community forums intended for users of Cloudera’s Platform for Big Data! (Login using your existing credentials or click the link to register.)
Although mailing lists have long been a standard for user interaction, and will undoubtedly continue to be, they have flaws. For example, they lack structure or taxonomy, which makes consumption difficult. Search functionality is often less than stellar and users are unable to build reputations that span an appreciable period of time. For these reasons, although they’re easy to create and manage, mailing lists inherently limit access to knowledge and hence limit adoption.
Users of CDH, Cloudera’s Big Data platform, are solving big problems and building amazing solutions with Apache Hadoop. We at Cloudera are very proud of our customers’ accomplishments, and it’s time to showcase them. This year we’re thrilled to present the first annual Data Impact Awards, an awards program designed to recognize Hadoop innovators for their achievements in five categories:
The Data Warehousing Institute (TDWI) runs an annual Best Practices Awards program to recognize organizations for their achievements in business intelligence and data warehousing. A few months ago, I was introduced to Motorola Mobility’s VP of cloud platforms and services, Balaji Thiagarajan. After learning about its interesting Apache Hadoop use case and the success it has delivered, Balaji and I worked together to nominate Motorola Mobility for the TDWI Best Practices Award for Emerging Technologies and Methods. And to my delight, it won!
Chances are, you’ve heard of Motorola Mobility. It released the first commercial portable cell phone back in 1984, later dominated the mobile phone market with the super-thin RAZR, and today a large portion of the massive smartphone market runs on its Android operating system.
At Cloudera, we believe that Cloudera Manager is the best way to install, configure, manage, and monitor your Apache Hadoop stack. Of course, most users prefer not to take our word for it — they want to know how Cloudera Manager works under the covers, first.
In this post, I’ll explain some of its inner workings.
The Vocabulary of Cloudera Manager
For those who are unfamiliar with it, Hue is a very popular, end-user focused, fully open source Web UI designed for interaction with Apache Hadoop and its ecosystem components. Founded by Cloudera employees, Hue has been around for quite some time, but only in the last 12 months has it evolved into the great ramp-up and interaction tool it is today. It’s fair to say that Hue is the most popular open source GUI for the Hadoop ecosystem among beginners — as well as a valuable tool for seasoned Hadoop users (and users generally in an enterprise environment) – and it is the only end-user tool that ships with Hadoop distributions today. In fact, Hue is even redistributed and marketed as part of other user-experience and ramp-up-on-Hadoop VMs in the market.
Just in time for Hadoop Summit 2013, the Apache Bigtop team is very pleased to announce the release of Bigtop 0.6.0: The very first release of a fully integrated Big Data management distribution built on the currently most advanced Hadoop 2.x, Hadoop 2.0.5-alpha.
Bigtop, as many of you might already know, is a project aimed at creating a 100% open source and community-driven Big Data management distribution based on Apache Hadoop. (You can learn more about it by reading one of our previous blog posts on Apache Blogs.) Bigtop also plays an important role in CDH, which utilizes its packaging code from Bigtop — Cloudera takes pride in developing open source packaging code and contributing the same back to the community.
Cloudera Impala has many exciting features, but one of the most impressive is the ability to analyze data in multiple formats, with no ETL needed, in HDFS and Apache HBase. Furthermore, you can use multiple frameworks, such as MapReduce and Impala, to analyze that same data. Consequently, Impala will often run side-by-side with MapReduce on the same physical hardware, with both supporting business-critical workloads. For such multi-tenant clusters, Impala and MapReduce both need to perform well despite potentially conflicting demands for cluster resources.
In this post, we’ll share our experiences configuring Impala and MapReduce for optimal multi-tenant performance. Our goal is to help users understand how to tune their multi-tenant clusters to meet production service level objectives (SLOs), and to contribute to the community some test methods and performance models that can be helpful beyond Cloudera.
Defining Realistic Test Scenarios
As you may know, Apache HBase has a vibrant community and gets a lot of contributions from developers worldwide. The collaborative development effort is so active, in fact, that a new point-release comes out about every six weeks (with the current stable branch being 0.94).
At Cloudera, we’re committed to ensuring that CDH, our open source distribution of Apache Hadoop and related projects (including HBase), ships with the results of this steady progress. Thus, CDH 4.2 was rebased on 0.94.2, as compared to its predecessor CDH 4.1, which was based on 0.92.1. CDH 4.3 has moved one step further and is rebased on 0.94.6.1.
Yesterday we announced the availability of Cloudera Manager 4.6. As part of this release, the Free Edition of Cloudera Manager (now a part of Cloudera Standard) has been enhanced significantly to include many features formerly only available with a subscription license:
Today is a big day: Cloudera is not only urging our customers to “Unaccept the Status Quo” (the continued and accelerating spending on data warehousing, expensive data storage, and associated software licenses), but we also announced that Cloudera Search has entered public beta. Now anyone who knows how to do a Google search can query data stored in Cloudera’s Platform for Big Data.
In this post, however, I’d like to explain the new, simpler product naming/packaging structure that will make adopting and deploying Cloudera more straightforward.