Cloudera Engineering Blog · General Posts

Cloudera Search: The Newest Hadoop Framework for CDH Users and Developers

One of the unexpected pleasures of open source development is the way that technologies adapt and evolve for uses you never originally anticipated.

Seven years ago, Apache Hadoop sprang from a project based on Apache Lucene, aiming to solve a search problem: how to scalably store and index the internet. Today, it’s my pleasure to announce Cloudera Search, which uses Lucene (among other things) to make search solve a Hadoop problem: how to let non-technical users interactively explore and analyze data in Hadoop.

HBaseCon 2013: "Internals" Track Preview

As we march toward HBaseCon 2013 (June 13 in San Francisco), it’s time to bring you a preview of the Internals track (see the Operations track preview here) — the track guaranteed to be of most interest to Apache HBase developers and other people tracking the progress of the code base.

CDH 4.3 is Released!

I’m pleased to announce that CDH 4.3 is released and available for download. This is the third quarterly update to our GA shipping CDH 4 line and the 17th significant release of our 100% open source Apache Hadoop distribution.

CDH 4.3 is primarily focused on maintenance. There are more than 400 bug fixes included in this release across the components of the CDH stack. This represents a great step forward in quality, security, and performance.

If It’s Tuesday, There Must Be a "Data Ride"

Mark your calendars, all you data cyclists!

I’m visiting Paris, London, and Edinburgh this June. When I travel I like to talk to locals. And, wherever I am, I like to bicycle. So, I thought I might combine these interests and host “data rides” in these three cities.

Fresh and Hot: HBaseCon 2013 Schedule Finalized!

The schedule/agenda grid for HBaseCon 2013 (rapidly approaching: June 13 in San Francisco) is a thing of beauty.

Metrics2: The New Hotness for Apache HBase Metrics

The post below was originally published at blogs.apache.org/hbase. We re-publish it here for your convenience.

Apache HBase is a distributed big data store modeled after Google’s Bigtable paper. As with all distributed systems, knowing what’s happening at a given time can help  spot problems before they arise, debug on-going issues, evaluate new usage patterns, and provide insight into capacity planning.

Extending the Data Warehouse with Hadoop

“Are data warehouses becoming victims of their own success?”, Tony Baer asks in a recent blog post:

Cloudera Development Kit (CDK): Hadoop Application Development Made Easier

Editor’s Note (Dec. 11, 2013): As of Dec. 2013, the Cloudera Development Kit is now known as the Kite SDK. Links below are updated accordingly.

At Cloudera, we have the privilege of helping thousands of developers learn Apache Hadoop, as well as build and deploy systems and applications on top of Hadoop. While we (and many of you) believe that platform is fast becoming a staple system in the data center, we’re also acutely aware of its complexities. In fact, this is the entire motivation behind Cloudera Manager: to make the Hadoop platform easy for operations staff to deploy and manage.

Where to Find Cloudera Tech Talks Through June 2013

It’s time for me to give you a quarterly update (here’s the one for Q1) about where to find tech talks by Cloudera employees in 2013. Committers, contributors, and other engineers will travel to meetups and conferences near and far to do their part in the community to make Apache Hadoop a household word!

(Remember, we’re always ready to assist your meetup by providing speakers, sponsorships, and schwag.)

For a Limited Time: Live Impala Demo on EC2

As a follow-up to a previous post about the Impala demo he built during Data Hacking Day, Alan Gardner from Pythian has deployed the app for a limited time on Amazon EC2. We republish his original post below.

A little while ago I blogged about (and open sourced) a Cloudera Impala-powered soccer visualization demo, designed to demonstrate just how responsive Impala queries can be. Since not everyone has the time or resources to run the project themselves, we’ve decided to host it ourselves on an EC2 instance. [Note: instance live only for one week!] You can try the visualization; we’ve also opened up the Impala web interface, where you can see query profiles and performance numbers, and Hue (username and password are both ‘test’), where you can run your own queries on the dataset.

Deploying Impala on EC2

Newer Posts Older Posts