Cloudera Blog · General Posts

Welcome, Tom!

We announced a leadership change at Cloudera today. Tom Reilly, formerly CEO at Arcsight, is joining us in my old role – CEO – and I am assuming two new posts: Chief Strategy Officer and Chairman of the Board of Directors.

When we started the company five years ago, almost no one had heard of Apache Hadoop. Big Data, to the extent the term was used at all, was strictly a consumer internet phenomenon. No other enterprise vendor believed the platform mattered.

We did, of course, and we set out to make that true.  We’ve engaged closely with the open source community, worked hard to advance the state of the art in the platform and crafted a business strategy that allows us to grow quickly and to build a great company for the long term.

The HBaseCon 2013 Afterglow

HBaseCon 2013 is in the books. Thanks to all our speakers, sponsors, and attendees! A great time was had by all.

For those of you who missed the show, session video and presentation slides (as well as photos) will be available via hbasecon.com in a few weeks. (To be notified, follow @cloudera or @ClouderaEng.) Although it’s not quite as good as being there with the rest of the community, you’ll still be able to partake from the real-world experiences of Apache HBase users like Facebook, Box, Yahoo!, Salesforce.com, Pinterest, Twitter, Groupon, and more.

While you’re waiting for that, allow me to bring you just this single photo to capture the HBaseCon experience:

With New Product Packaging, Adopting the Platform for Big Data is Even Easier

Today is a big day: Cloudera is not only urging our customers to “Unaccept the Status Quo” (the continued and accelerating spending on data warehousing, expensive data storage, and associated software licenses), but we also announced that Cloudera Search has entered public beta. Now anyone who knows how to do a Google search can query data stored in Cloudera’s Platform for Big Data.

In this post, however, I’d like to explain the new, simpler product naming/packaging structure that will make adopting and deploying Cloudera more straightforward.

Introducing Cloudera Standard

From now on, in addition to CDH, our 100% open source distribution of Apache Hadoop and related projects that is always available to whoever wants to try it, we will offer customers two options that also include Cloudera Manager, our management automation software:

Cloudera Search: The Newest Hadoop Framework for CDH Users and Developers

One of the unexpected pleasures of open source development is the way that technologies adapt and evolve for uses you never originally anticipated.

Seven years ago, Apache Hadoop sprang from a project based on Apache Lucene, aiming to solve a search problem: how to scalably store and index the internet. Today, it’s my pleasure to announce Cloudera Search, which uses Lucene (among other things) to make search solve a Hadoop problem: how to let non-technical users interactively explore and analyze data in Hadoop.

Cloudera Search is released to public beta, as of today. (See a demo here; get installation instructions here.) Powered by Apache Solr 4.3, Cloudera Search allows hundreds of users to search petabytes of Hadoop data interactively.

HBaseCon 2013: "Internals" Track Preview

As we march toward HBaseCon 2013 (June 13 in San Francisco), it’s time to bring you a preview of the Internals track (see the Operations track preview here) — the track guaranteed to be of most interest to Apache HBase developers and other people tracking the progress of the code base.

This track, hosted by Salesforce.com’s Lars Hofhansl (also an HBase PMC Member and HBaseCon keynote speaker), focuses on the architecture, features, and development of HBase. You will learn about interesting features, best practices for using them in production/business-critical environments, and how development is done by the community.

CDH 4.3 is Released!

I’m pleased to announce that CDH 4.3 is released and available for download. This is the third quarterly update to our GA shipping CDH 4 line and the 17th significant release of our 100% open source Apache Hadoop distribution.

CDH 4.3 is primarily focused on maintenance. There are more than 400 bug fixes included in this release across the components of the CDH stack. This represents a great step forward in quality, security, and performance.

There are also a few new features in this release. One new feature is the ability of HDFS to rebalance within a datanode. This is a great (configurable) way to help prevent drive failure and maintain performance without having to run more disruptive cluster-wide rebalances. Hue has also received a number of new features, including a Pig editor and support for using the HDFS trash bin.

If It’s Tuesday, There Must Be a "Data Ride"

Mark your calendars, all you data cyclists!

I’m visiting Paris, London, and Edinburgh this June. When I travel I like to talk to locals. And, wherever I am, I like to bicycle. So, I thought I might combine these interests and host “data rides” in these three cities.

In each city I’ll name a time and a meeting point, and then ride the local roads for an hour or two with whomever shows up. Afterward, we might need some libations at a local pub. I might even get Cloudera to throw in some schwag.

Fresh and Hot: HBaseCon 2013 Schedule Finalized!

The schedule/agenda grid for HBaseCon 2013 (rapidly approaching: June 13 in San Francisco) is a thing of beauty.

If you lacked motivation to register up until this point, we think that this session line-up will convince you otherwise. We repeat: whether you’re an HBase committer or just getting started (or at any level in between), HBaseCon is simply an event that you can’t afford to miss – and with an entry fee of just $350, it’s also one you can easily afford.

Metrics2: The New Hotness for Apache HBase Metrics

The post below was originally published at blogs.apache.org/hbase. We re-publish it here for your convenience.

Apache HBase is a distributed big data store modeled after Google’s Bigtable paper. As with all distributed systems, knowing what’s happening at a given time can help  spot problems before they arise, debug on-going issues, evaluate new usage patterns, and provide insight into capacity planning.

Since October 2008, version 0.19.0 (HBASE-625), HBase has been using Apache Hadoop’s metrics system to export metrics to JMX, Ganglia, and other metrics sinks. As the code base grew, more and more metrics were added by different developers. New features got metrics. When users needed more data on issues, they added more metrics. These new metrics were not always consistently named, and some were not well documented.

Extending the Data Warehouse with Hadoop

“Are data warehouses becoming victims of their own success?”, Tony Baer asks in a recent blog post:

Older Posts