At Cloudera, we have the privilege of helping thousands of developers learn Apache Hadoop, as well as build and deploy systems and applications on top of Hadoop. While we (and many of you) believe that platform is fast becoming a staple system in the data center, we’re also acutely aware of its complexities. In fact, this is the entire motivation behind Cloudera Manager: to make the Hadoop platform easy for operations staff to deploy and manage.
So, we’ve made Hadoop much easier to “consume” for admins and other operators — but what about for developers, whether working for ISVs, SIs, or users? Until now, they’ve largely been on their own.
That’s why we’re really excited to announce the Cloudera Developer Kit (CDK), a new open source project designed to help developers get up and running to build applications on CDH, Cloudera’s open source distribution including Hadoop, faster and easier than before. The CDK is a collection of libraries, tools, examples, and documentation engineered to simplify the most common tasks when working with the platform. Just like CDH, the CDK is 100% free, open source, and licensed under the same permissive Apache License v2, so you can use the code any way you choose in your existing commercial code base or open source project.
Our thanks to Ted Wasserman, product manager for Tableau, for the guest post below:
Many of our customers are turning to Apache Hadoop as they grapple with their big data challenges. Hadoop offers many benefits such as its scalability, economics, and versatility. Even so, adoption-to-date has largely centered around applications with “batch”-oriented workloads because of the latency imposed by the MapReduce framework. To increase Hadoop’s usefulness and adoption in the business intelligence space where users need fast, interactive response times when they ask a question, a new approach was needed.
Cloudera Impala technology moves the ball forward for doing ad hoc visual analytics on Hadoop. In particular, we like Impala for several reasons:
This week, the Cloudera Sessions head to Washington, DC, and Columbus, Ohio, where attendees will hear from AOL, Explorys, and Skybox Imaging about the ways Apache Hadoop can be used to optimize digital content, to improve the delivery of healthcare, and to generate high-resolution images of the entire globe that provide value to retailers, farmers, government organizations and more.
I’d like to take this opportunity to shine a spotlight on Skybox Imaging, an innovative company that is putting Hadoop to work to help us see the world more clearly, literally.
Skybox’s vice president of ground software, Ollie Guinan, recently posted a guest blog to Cloudera.com to give readers a glimpse into their Hadoop use case, which I’d like to promote again here. I would encourage anyone in the DC area to meet Ollie (who is also a Champion of Big Data) in person at the Cloudera Sessions event in DC this Tuesday to learn more about Skybox and its fascinating use case.
Our thanks to Yves de Montcheuil, Vice President of Marketing for Talend, for the guest post below:
According to Wikipedia, the impala is a medium-sized African antelope; its name comes from the Zulu language meaning “gazelle”. Like elephants, it is found in savannas, and this may be the link with Hadoop. Impala is also the name of Cloudera’s SQL-on-Apache Hadoop project, launched in beta at Strata last October and just released in version 1.0.
SQL-on-Hadoop – wait a minute… isn’t it what Apache Hive is for? Well, yes and no. HiveQL certainly brings a set of SQL-like commands to Hadoop data. The big issue with Hive: it’s very slow. More precisely, it’s not interactive. Queries take a long time to be “parsed” and distributed across the cluster. Response times can reach the minute, which is highly impractical for interactive use. It works fine for batch use (response times actually don’t vary much based on the dataset size), but when users want to mine Hadoop data, perform interactive queries or drill-downs, profile data, etc. – they end up spending lots of time glaring at their screen (or fetching more coffee than they should).
Our thanks to Kevin Spurway, Senior Vice President of Marketing for MicroStrategy Inc., for the guest post below:
Squeezing insight from Big Data isn’t easy. It’s a delicate balance between scalability, performance, and cost effectiveness across an entire architecture, spanning everything from data storage to mobile app consumption. That’s why MicroStrategy and Cloudera have been working closely together from a technology standpoint. And, that’s why we’re proud to stand as a launch partner, certifying the integration between Cloudera’s new Impala project and our core MicroStrategy enterprise analytics platform.
Impala is a giant step toward an era of highly cost-effective interactive analytics for Hadoop-based Big Data.
We’ve been collaborating with Cloudera on Impala since its early stages, actively testing functionality, recommending enhancements, reviewing roadmaps, and sharing performance results. We’re especially enthusiastic because we see the launch of Impala as a giant step toward an era of highly cost-effective interactive analytics for Apache Hadoop-based Big Data, at speeds previously not possible.
This week represents quite a milestone for Cloudera and, at least we’d like to believe, the Hadoop ecosystem at large: the general availability release of Cloudera Impala. Since we launched the Impala beta program last fall, I’ve been fortunate enough to work with many of the 40+ early adopters who’ve been testing this near-real-time SQL-on-Hadoop engine in an effort to learn about their use cases and keep tabs on early experiences with the tool.
Customers running Impala today span a variety of industries, from large biotech company to online travel provider to digital advertiser to major financial institution, and each one has a unique use case for Impala. Stay tuned to learn more about their various use cases.
This week, I’d like to highlight Six3 Systems’ Wayne Wheeles (also a Champion of Big Data), who has been working with Impala to improve cyber security solutions, in particular the open source SherpaSurfing product.
On Monday April 29, Cloudera announced a strategic alliance with SAS. As the industry leader in business analytics software, SAS brings a formidable toolset to bear on the problem of extracting business value from large volumes of data.
Over the past few months, Cloudera has been hard at work along with the SAS team to integrate a number of SAS products with Apache Hadoop, delivering the ability for our customers to use these tools in their interaction with data on the Cloudera platform. In this post, we will delve into the major mechanisms that are available for connecting SAS to CDH, Cloudera’s 100% open-source distribution including Hadoop.
SAS/ACCESS to Hadoop
SAS/ACCESS provides the ability to access data sets stored in Hadoop in SAS natively. With SAS/Access to Hadoop:
In October 2012, we introduced the Impala project, at that time the first known effort to bring a modern, open source, distributed SQL query engine to Apache Hadoop. Our release of source code and a beta implementation were met with widespread acclaim — and later inspired similar efforts across the industry that now measure themselves against the Impala standard.
Today, we are proud to announce the first production drop of Impala (download here), which reflects feedback from across the user community based on multiple types of real-world workloads. Just as a refresher, the main design principle behind Impala is complete integration with the Hadoop platform (jointly utilizing a single pool of storage, metadata model, security framework, and set of system resources). This integration allows Impala users to take advantage of the time-tested cost, flexibility, and scale advantages of Hadoop for interactive SQL queries, and makes SQL a first-class Hadoop citizen alongside MapReduce and other frameworks. The net result is that all your data becomes available for interactive analysis simultaneously with all other types of processing, with no ETL delays needed.
Although the features and performance results described below are impressive, it’s important to note that they represent only a down payment toward the full promise of Impala. There is much more to come — and soon.
Features in Impala 1.0
It has been an exciting couple of days for new product announcements at Cloudera — exciting especially for me as the edges of the new platform for big data we have been talking about since Strata + Hadoop World 2012 come into focus.
Yesterday, Cloudera announced a strategic alliance with SAS. SAS is the industry leader in business analytics software, especially predictive analytics. Ninety percent of the Fortune 100 run SAS today. We have been working with SAS to make a number of its products work well with Cloudera including SAS Access, SAS Visual Analytics, and SAS High Performance Analytics (HPA). SAS HPA is an excellent case example of the future direction of Apache Hadoop as a data management platform:
We’re very happy to announce the 2.3 release of Hue, the open source Web UI that makes Apache Hadoop easier to use.
Hue 2.3 comes only two months after 2.2 but contains more than 100 improvements and fixes. In particular, two new apps were added (including an Apache Pig editor) and the query editors are now easier to use.
Here’s a video demoing the major changes: