Cloudera, The Platform for Big Data

Categories: CDH Hadoop Impala

Today we’re proud to announce a new addition to the Apache Hadoop ecosystem: Cloudera Impala, a parallel SQL engine that runs natively on Hadoop storage. The salient points are:

  • Hive compatible
  • 10x the performance of Hive/MapReduce, on average
  • 100% open source, under the Apache License v2 – just like Hadoop
  • Tested to run on CDH4.1 or higher

There’s a blog post that follows mine that provides more details about Impala and how it works. I’d like to touch on a few related points.

Impala brings useful new capabilities to the platform in its own right. It enables interactive SQL on Hadoop data, whether stored in HDFS or in HBase, where previously there was only batch. It substantially improves the quality of experience for business intelligence users. It improves the economics of running ELT workloads on Hadoop clusters. But perhaps just as important as what Impala *brings* to the platform is what Impala *says* about the platform.  It says that we are building on a fundamentally new and better architecture for data management. It is architecture in which:

  • We can add new forms of computation to an elastic, economical, linearly scalable, secure and durable pool of storage.
  • You can work from a shared, open metadata model.
  • You can store, explore, process, analyze and serve data without having to bolt together several disparate systems and repeatedly copy data among them.

All of this is delivered as a completely open-source platform, where customers pay for results from technology, not technology itself.

At Cloudera, we believe there will continue to be strong demand for all sorts of data management products like data warehouses, XML databases and document stores that each excel in their respective niche. But from where we stand, CDH represents the best possible point of departure for Big Data applications. We’ve come a long way from the filesystem and a batch processing engine that is Apache Hadoop. I think we now can see the outlines of a single platform for big data.

Impala is immediately available as a public beta. You can find links to the download, documentation and installation information here. We hope you try it out, give us your feedback and get involved in its evolution.