Cloudera Blog · CDH Posts

Cloudera Development Kit (CDK): Hadoop Application Development Made Easier

At Cloudera, we have the privilege of helping thousands of developers learn Apache Hadoop, as well as build and deploy systems and applications on top of Hadoop. While we (and many of you) believe that platform is fast becoming a staple system in the data center, we’re also acutely aware of its complexities. In fact, this is the entire motivation behind Cloudera Manager: to make the Hadoop platform easy for operations staff to deploy and manage.

So, we’ve made Hadoop much easier to “consume” for admins and other operators — but what about for developers, whether working for ISVs, SIs, or users? Until now, they’ve largely been on their own.

That’s why we’re really excited to announce the Cloudera Developer Kit (CDK), a new open source project designed to help developers get up and running to build applications on CDH, Cloudera’s open source distribution including Hadoop, faster and easier than before. The CDK is a collection of libraries, tools, examples, and documentation engineered to simplify the most common tasks when working with the platform. Just like CDH, the CDK is 100% free, open source, and licensed under the same permissive Apache License v2, so you can use the code any way you choose in your existing commercial code base or open source project.

Customer Spotlight: Sneak Peek into Skybox Imaging’s Cloudera-powered Satellite System

This week, the Cloudera Sessions head to Washington, DC, and Columbus, Ohio, where attendees will hear from AOL, Explorys, and Skybox Imaging about the ways Apache Hadoop can be used to optimize digital content, to improve the delivery of healthcare, and to generate high-resolution images of the entire globe that provide value to retailers, farmers, government organizations and more.

I’d like to take this opportunity to shine a spotlight on Skybox Imaging, an innovative company that is putting Hadoop to work to help us see the world more clearly, literally.

Skybox’s vice president of ground software, Ollie Guinan, recently posted a guest blog to Cloudera.com to give readers a glimpse into their Hadoop use case, which I’d like to promote again here. I would encourage anyone in the DC area to meet Ollie (who is also a Champion of Big Data) in person at the Cloudera Sessions event in DC this Tuesday to learn more about Skybox and its fascinating use case.

How the SAS and Cloudera Platforms Work Together

On Monday April 29, Cloudera announced a strategic alliance with SAS. As the industry leader in business analytics software, SAS brings a formidable toolset to bear on the problem of extracting business value from large volumes of data.

Over the past few months, Cloudera has been hard at work along with the SAS team to integrate a number of SAS products with Apache Hadoop, delivering the ability for our customers to use these tools in their interaction with data on the Cloudera platform. In this post, we will delve into the major mechanisms that are available for connecting SAS to CDH, Cloudera’s 100% open-source distribution including Hadoop.

SAS/ACCESS to Hadoop

SAS/ACCESS provides the ability to access data sets stored in Hadoop in SAS natively. With SAS/Access to Hadoop:

Customer Spotlight: Nokia’s Big Data Ecosystem Connects Cloudera, Teradata, Oracle, and Others

As Cloudera’s keeper of customer stories, it’s dawned on me that others might benefit from the information I’ve spent the past year collecting: the many use cases and deployment patterns for Hadoop amongst our customer base.

This week I’d like to highlight Nokia, a global company that we’re all familiar with as a large mobile phone provider, and whose Senior Director of Analytics – Amy O’Connor – will be speaking at tomorrow’s Cloudera Sessions event in Boston.

Fun fact: Nokia has been in business for more than 150 years, starting with the production of paper in the 1800s. When I first met Amy O’Connor in early 2012, she explained to me that Nokia has always been in the business of transforming resources into useful products — from paper and rubber over a century ago, to the electronics and mobile devices we’re familiar with today.

Cloudera Academic Partnership Program: Creating Hadoop Lovers in Universities Worldwide

Today Cloudera announced a new Cloudera Academic Partnership program, in which participating universities worldwide get access to curriculum, training, certification, and software. 

As noted in the press release, the global demand for people with Apache Hadoop and data science skills is dwarfing all supply. We consider it an important mission to help accredited universities meet that demand, by equipping them with the content and training they need to educate students in the Hadoop arts.

Furthermore, we are cognizant of the fact that many academic research labs are in need of tools to help deploy, manage, and extend Hadoop clusters. For that reason, CAP members get free access to Cloudera Manager Enterprise Edition for 12 months to support data-intensive testing, development, and research.

How Persado Supports Persuasion Marketing Technology with Hive and Pig Training

This guest post comes from Alex Giamas, Senior Software Engineer on the data warehouse team at Persado, an ultra-hot persuasion marketing technology company with operations in Athens, Greece.

A World-Class EDW Requires a World-Class Hadoop Team

Persado is the global leader in persuasion marketing technology, a new category in digital marketing. Our revolutionary technology maps the genome of marketing language and generates the messages that work best for any customer and any product at any time. To assure the highest quality experience for both our clients and end-users, our engineering team collaborates with Ph.D. statisticians and data analysts to develop new ways to segment audiences, discover content, and deliver the most relevant and effective marketing messages in real time.

Given the challenge of creating a market based on ongoing data collection and massive query ability, the data warehouse organization ultimately plays the most important role in the persuasion marketing value chain, assuring a steady and unobstructed multidirectional flow of information. My team continuously ensures Persado’s infrastructure is aligned to the needs of our data scientists, including regularly generating KPI reports, managing data from heterogeneous sources, preparing customized analyses, and even implementing specific statistical algorithms in Java based on reference implementations of R.

It’s Only Rock and Roll

It’s only Rock and Roll, but I like it!
           – Mick Jagger

Copyright is having a tough time in the digital age. New copies of music, movies and software can be created at near zero cost. Some wonder whether it still makes sense to ever charge for content. Over the past century large industries have developed that sell content. These industries resist change. We consumers love our content, but don’t love paying for it. But would all the content we love still exist without payment for copyright?

One solution might be to replace sales of copyrighted material with services that provide access to the content. We could buy tickets to concerts, a service provided by musicians. We could enjoy streaming music services via subscriptions or supported by advertising. Similarly, we could access software as a service in the cloud. Some companies, like Google, create proprietary software yet don’t make money off it by selling its copyright, but only through services. That’s a great model, but is it the only way forward?

How-to: Use Vagrant to Set Up a Virtual Hadoop Cluster

This guest post comes to us from David Greco, CTO of Eligotech.

Vagrant is a very nice tool for programmatically managing many virtual machines (VMs) on a single physical machine. It natively supports VirtualBox and also provides plugins for VMware Fusion and Amazon EC2, supporting the management of VMs in those environments as well.

Vagrant provides a very easy-to-use, Ruby-based internal DSL that allows the user to define one or more virtual machines together with their configuration parameters. Furthermore, it offers different mechanisms for automatic provisioning: You can use Puppet, Chef, or shell scripts for automating software installation and configuration on the machines defined in the Vagrant configuration file.

We Honor the Champions of Big Data!

In the technology business, building a thriving and progressive user ecosystem around a platform is about as Mom-and-apple-pie as you can get. We all intuitively acknowledge that it’s one of the metrics for success.

Perhaps the most under-appreciated aspect of any platform ecosystem is the recognition that it is fundamentally built by real people. Without enthusiastic users of a platform engaging as evangelists on its behalf, the growth of the ecosystem around it will eventually slow to a crawl.

How-to: Create a CDH Cluster on Amazon EC2 via Cloudera Manager

Cloudera Manager 4.5 includes a new express installation wizard for Amazon Web Services (AWS) EC2. (This feature is also available in Cloudera Manager Free Edition.) Its goal is to enable Cloudera Manager users to provision CDH clusters and Cloudera Impala (the new open source distributed query engine for Apache Hadoop) on EC2 as easily as possible - and thus is currently the fastest way to provision a Cloudera Manager-managed cluster in EC2.

The new distinguishing feature is that Cloudera Manager can now launch and configure the instances for you, so you don’t have to worry about launching the instances, authorizing SSH keys, and configuring a firewall. All this can now be done from within Cloudera Manager! 

Since Cloudera Manager and the nodes running CDH use internal hostnames to communicate, the Cloudera Manager server must run on EC2 as well. In fact, the Cloud Express Wizard only appears when installing Cloudera Manager on EC2.

Older Posts