Category Archives: Cloud

Hands-on Hive-on-Spark in the AWS Cloud

Categories: Cloud Community Hive Spark

Interested in Hive-on-Spark progress? This new AMI gives you a hands-on experience.

Nearly one year ago, the Apache Hadoop community began to embrace Apache Spark as a powerful batch-processing engine. Today, many organizations and projects are augmenting their Hadoop capabilities with Spark. As part of this shift, the Apache Hive community is working to add Spark as an execution engine for Hive. The Hive-on-Spark work is being tracked by HIVE-7292 which is one of the most popular JIRAs in the Hadoop ecosystem.

Read more

Inside Cloudera Director

Categories: Cloud Cloudera Manager Ops and DevOps

With Cloudera Director, cloud deployments of Apache Hadoop are now as enterprise-ready as on-premise ones. Here’s the technology behind it.

As part of the recent Cloudera Enterprise 5.2 release, we unveiled Cloudera Director, a new product that delivers enterprise-class, self-service interaction with Hadoop clusters in cloud environments. (Cloudera Director is free to download and use, but commercial support requires a Cloudera Enterprise subscription.) It provides a centralized administrative view for cloud deployments and lets end users provision and scale clusters themselves using automated,

Read more

How-to: Write Apache Hadoop Applications on OpenShift with Kite SDK

Categories: Cloud Hadoop How-to Kite SDK

The combination of OpenShift and Kite SDK turns out to be an effective one for developing and testing Apache Hadoop applications.

At Cloudera, our engineers develop a variety of applications on top of Hadoop to solve our own data needs (here and here). More recently, we’ve started to look at streamlining our development process by using a PaaS (Platform-as-a-Service) for some of these applications. Having single-click deployment and updates to consistent development environments lets us onboard new developers more quickly,

Read more

Cloudera Enterprise 5.2 is Released

Categories: CDH Cloud Hadoop

Cloudera Enterprise 5.2 contains new functionality for security, cloud deployments, and real-time architectures, and support for the latest open source component releases and partner technologies.

We’re pleased to announce the release of Cloudera Enterprise 5.2 (comprising CDH 5.2, Cloudera Manager 5.2, Cloudera Director 1.0, and Cloudera Navigator 2.1).

This release reflects our continuing investments in Cloudera Enterprise’s main focus areas, including security, integration with the partner ecosystem, and support for the latest innovations in the open source platform (including Impala 2.0,

Read more

Using Impala, Amazon EMR, and Tableau to Analyze and Visualize Data

Categories: Cloud General Guest

Our thanks to AWS Solutions Architect Rahul Bhartia for allowing us to republish his post below.

Apache Hadoop provides a great ecosystem of tools for extracting value from data in various formats and sizes. Originally focused on large-batch processing with tools like MapReduce, Apache Pig, and Apache Hive, Hadoop now provides many tools for running interactive queries on your data, such as Impala, Drill, and Presto. This post shows you how to use Amazon Elastic MapReduce (Amazon EMR) to analyze a data set available on Amazon Simple Storage Service (Amazon S3) and then use Tableau with Impala to visualize the data.

Read more