Cloudera Developer Blog · CDH Posts
As you may know, Apache HBase has a vibrant community and gets a lot of contributions from developers worldwide. The collaborative development effort is so active, in fact, that a new point-release comes out about every six weeks (with the current stable branch being 0.94).
At Cloudera, we’re committed to ensuring that CDH, our open source distribution of Apache Hadoop and related projects (including HBase), ships with the results of this steady progress. Thus, CDH 4.2 was rebased on 0.94.2, as compared to its predecessor CDH 4.1, which was based on 0.92.1. CDH 4.3 has moved one step further and is rebased on 0.94.6.1.
Apart from the rebase, CDH 4.3 also has some important bug fixes backported from later versions of 0.94 and trunk. Following are some of the important features and improvements that now ship in CDH 4.2/CDH 4.3:
Yesterday we announced the availability of Cloudera Manager 4.6. As part of this release, the Free Edition of Cloudera Manager (now a part of Cloudera Standard) has been enhanced significantly to include many features formerly only available with a subscription license:
Today is a big day: Cloudera is not only urging our customers to “Unaccept the Status Quo” (the continued and accelerating spending on data warehousing, expensive data storage, and associated software licenses), but we also announced that Cloudera Search has entered public beta. Now anyone who knows how to do a Google search can query data stored in Cloudera’s Platform for Big Data.
In this post, however, I’d like to explain the new, simpler product naming/packaging structure that will make adopting and deploying Cloudera more straightforward.
Introducing Cloudera Standard
From now on, in addition to CDH, our 100% open source distribution of Apache Hadoop and related projects that is always available to whoever wants to try it, we will offer customers two options that also include Cloudera Manager, our management automation software:
One of the unexpected pleasures of open source development is the way that technologies adapt and evolve for uses you never originally anticipated.
Seven years ago, Apache Hadoop sprang from a project based on Apache Lucene, aiming to solve a search problem: how to scalably store and index the internet. Today, it’s my pleasure to announce Cloudera Search, which uses Lucene (among other things) to make search solve a Hadoop problem: how to let non-technical users interactively explore and analyze data in Hadoop.
Cloudera Search is released to public beta, as of today. (See a demo here; get installation instructions here.) Powered by Apache Solr 4.3, Cloudera Search allows hundreds of users to search petabytes of Hadoop data interactively.
I’m pleased to announce that CDH 4.3 is released and available for download. This is the third quarterly update to our GA shipping CDH 4 line and the 17th significant release of our 100% open source Apache Hadoop distribution.
CDH 4.3 is primarily focused on maintenance. There are more than 400 bug fixes included in this release across the components of the CDH stack. This represents a great step forward in quality, security, and performance.
There are also a few new features in this release. One new feature is the ability of HDFS to rebalance within a datanode. This is a great (configurable) way to help prevent drive failure and maintain performance without having to run more disruptive cluster-wide rebalances. Hue has also received a number of new features, including a Pig editor and support for using the HDFS trash bin.
This week I’d like to highlight King.com, a European social gaming giant that recently claimed the throne for having the most daily active users (more than 66 million). King.com has methodically and successfully expanded its reach beyond mainstream social gaming to dominate the mobile gaming market — it offers a streamlined experience that allows gamers to pick up their gaming session from wherever they left off, in any game and on any device. King.com’s top games include “Candy Crush Saga” and “Bubble Saga”.
And — you guessed it — King.com runs on CDH.
With a business model that offers all games for free, King.com relies advertising and in-game products like boosters and extra lives to generate revenue. In other words, it has to be smart in every communication with customers in order to create value for both the gamer and the advertiser.
Have you ever wished you could upgrade to the latest CDH minor release with just a few mouse clicks, and even without taking any downtime on your cluster? Well, with Cloudera Manager 4.5 and its new “Parcel” feature, you can!
That release introduced many new features and capabilities related to parcels, and in this FAQ-oriented post, you will learn about most of them.
What are parcels?
Parcel is an alternative binary distribution format supported for the first time in Cloudera Manager 4.5. There are a few notable differences between parcels and traditional CDH rpm/deb packages:
Editor’s Note (Dec. 11, 2013): As of Dec. 2013, the Cloudera Development Kit is now known as the Kite SDK. Links below are updated accordingly.
At Cloudera, we have the privilege of helping thousands of developers learn Apache Hadoop, as well as build and deploy systems and applications on top of Hadoop. While we (and many of you) believe that platform is fast becoming a staple system in the data center, we’re also acutely aware of its complexities. In fact, this is the entire motivation behind Cloudera Manager: to make the Hadoop platform easy for operations staff to deploy and manage.
So, we’ve made Hadoop much easier to “consume” for admins and other operators — but what about for developers, whether working for ISVs, SIs, or users? Until now, they’ve largely been on their own.
This week, the Cloudera Sessions head to Washington, DC, and Columbus, Ohio, where attendees will hear from AOL, Explorys, and Skybox Imaging about the ways Apache Hadoop can be used to optimize digital content, to improve the delivery of healthcare, and to generate high-resolution images of the entire globe that provide value to retailers, farmers, government organizations and more.
I’d like to take this opportunity to shine a spotlight on Skybox Imaging, an innovative company that is putting Hadoop to work to help us see the world more clearly, literally.
Skybox’s vice president of ground software, Ollie Guinan, recently posted a guest blog to Cloudera.com to give readers a glimpse into their Hadoop use case, which I’d like to promote again here. I would encourage anyone in the DC area to meet Ollie (who is also a Champion of Big Data) in person at the Cloudera Sessions event in DC this Tuesday to learn more about Skybox and its fascinating use case.
On Monday April 29, Cloudera announced a strategic alliance with SAS. As the industry leader in business analytics software, SAS brings a formidable toolset to bear on the problem of extracting business value from large volumes of data.
Over the past few months, Cloudera has been hard at work along with the SAS team to integrate a number of SAS products with Apache Hadoop, delivering the ability for our customers to use these tools in their interaction with data on the Cloudera platform. In this post, we will delve into the major mechanisms that are available for connecting SAS to CDH, Cloudera’s 100% open-source distribution including Hadoop.
SAS/ACCESS to Hadoop
SAS/ACCESS provides the ability to access data sets stored in Hadoop in SAS natively. With SAS/Access to Hadoop: