Cloudera Blog · Impala Posts
I am pleased to announce the release of Cloudera Impala Beta (version 0.4) and Cloudera Manager 4.1.3. Key enhancements in each release are:
Cloudera Impala Beta (version 0.4)
For several good reasons, 2013 is a Happy New Year for Apache Hadoop enthusiasts.
In 2012, we saw continued progress on developing the next generation of the MapReduce processing framework (MRv2), work that will bear fruit this year. HDFS experienced major progress toward becoming a lights-out, fully enterprise-ready distributed filesystem with the addition of high availability features and increased performance. And a hint of the future of the Hadoop platform was provided with the Beta release of Cloudera Impala, a real-time query engine for analytics across HDFS and Apache HBase data.
Let’s look at the highlights of the 2012 developments around projects supported by Cloudera.
Apache Hadoop Releases
(Update 2/6/2013 – Sorry, this event is sold out!)
With Strata Conference 2013 coming to town (Feb. 26-28, in Santa Clara, Calif.), we thought it would be a great opportunity to open our Palo Alto office’s doors for a pre-conference “Data Hacking Day” on Monday, Feb. 25!
Participants will use Cloudera Impala, the open-source, real-time query engine for Apache Hadoop, to hack on a rich public data set. After forming teams, you’ll compete to see whose project will earn enough votes to win the data-hacking trophy for the day. All members of the winning team will get free hard copies of Eric Sammer’s coveted O’Reilly book, Hadoop Operations.
In this installment of “Meet the Engineer”, meet Marcel Kornacker, the architect of the Cloudera Impala open-source real-time query engine for Apache Hadoop.
What do you do at Cloudera?
In this installment of “Meet the Engineer”, meet Nong Li, a software engineer working on the open-source Cloudera Impala real-time query engine.
What do you do at Cloudera?
I’ve been working at Cloudera for a little over a year now and for the whole time, I’ve been working on Cloudera Impala. On Impala, I spend most of my time focusing on improving the performance of the query execution engine, working on the IO subsystem, JIT-compiling portions of the query execution, and working on expression evaluation and other performance-centric components.
It’s been an exciting month and a half since the launch of the Cloudera Impala (the new open source distributed query engine for Apache Hadoop) beta, and we thought it’d be a great time to provide an update about what’s next for the project – including our product roadmap, release schedule and open-source plan.
First of all, we’d like to thank you for your enthusiasm and valuable beta feedback. We’re actively listening and have already fixed many of the bugs reported, captured feature requests for the roadmap, and updated the Cloudera Impala FAQ based on user input.
Our primary focus between now and general availability (GA) is making Impala enterprise-ready for your production Hadoop clusters. This means continued investments in product stability as well as product functionality, including:
At Cloudera, we put great pride into drinking our own champagne. That pride extends to our support team, in particular.
Cloudera Manager, our end-to-end management platform for CDH (Cloudera’s open-source, enterprise-ready distribution of Apache Hadoop and related projects), has a feature that allows subscription customers to send a snapshot of their cluster to us. When these cluster snapshots come to us from customers, they end up in a CDH cluster at Cloudera where various forms of data processing and aggregation can be performed.
Today, the system provides real-time support via an application we call CSI. When a support employee looks at a ticket, they can use CSI to examine the customer’s latest snapshot and see cluster stats such as version information, number of nodes in service, which services are used, and so on. CSI also visualizes different aggregations and groupings, such as versions, which allows us to detect misconfigured clusters, or issues caused during upgrade or installation.
I am pleased to announce the release of Cloudera Impala Beta (version 0.3) and Cloudera Manager 4.1.2. Key enhancements in each release are:
Cloudera Impala Beta (version 0.3)
The beta release of Cloudera Impala, the first (and open source) real-time query engine for Apache Hadoop, has been out in the wild (in binary as well as VM forms) for over a month now, and users have had time to get up-close and hands-on. Consequently, we’re beginning to see some fascinating self-published observations and guides.
Here are just a few examples; you may know of more that we’ve missed:
Since the Cloudera Impala announcement of a few weeks ago, we’ve been busy partnering-up with Hadoop meetups around the country (and beyond) to bring Impala tech talks directly to the community. Here’s the list for the remainder of 2012, thus far: