Tag Archives: monitoring

Apache Hive on Apache Spark: The First Demo

Categories: Community Hive MapReduce Spark

The community effort to make Apache Spark an execution engine for Apache Hive is making solid progress.

Apache Spark is quickly becoming the programmatic successor to MapReduce for data processing on Apache Hadoop. Over the course of its short history, it has become one of the most popular projects in the Hadoop ecosystem, and is now supported by multiple industry vendors—ensuring its status as an emerging standard.

Two months ago Cloudera,

Read more

Inside Cloudera Director

Categories: Cloud Cloudera Manager Ops and DevOps

With Cloudera Director, cloud deployments of Apache Hadoop are now as enterprise-ready as on-premise ones. Here’s the technology behind it.

As part of the recent Cloudera Enterprise 5.2 release, we unveiled Cloudera Director, a new product that delivers enterprise-class, self-service interaction with Hadoop clusters in cloud environments. (Cloudera Director is free to download and use, but commercial support requires a Cloudera Enterprise subscription.) It provides a centralized administrative view for cloud deployments and lets end users provision and scale clusters themselves using automated,

Read more

How-to: Write Apache Hadoop Applications on OpenShift with Kite SDK

Categories: Cloud Hadoop How-to Kite SDK

The combination of OpenShift and Kite SDK turns out to be an effective one for developing and testing Apache Hadoop applications.

At Cloudera, our engineers develop a variety of applications on top of Hadoop to solve our own data needs (here and here). More recently, we’ve started to look at streamlining our development process by using a PaaS (Platform-as-a-Service) for some of these applications. Having single-click deployment and updates to consistent development environments lets us onboard new developers more quickly,

Read more

Secrets of Cloudera Support: Using OpenStack to Shorten Time-to-Resolution

Categories: Cloud Support

Automating the creation of short-lived clusters for testing purposes frees our support engineers to spend more time on customer issues.

The first step for any support engineer is often to replicate the customer’s environment in order to identify the problem or issue. Given the complexity of Cloudera customer environments, reproducing a specific issue is often quite difficult, as a customer’s problem might only surface in an environment with specific versions of Cloudera Enterprise (CDH + Cloudera Manager),

Read more

Apache Kafka for Beginners

Categories: Flume Kafka

When used in the right way and for the right use case, Kafka has unique attributes that make it a highly attractive option for data integration.

Apache Kafka is creating a lot of buzz these days. While LinkedIn, where Kafka was founded, is the most well known user, there are many companies successfully using this technology.

So now that the word is out, it seems the world wants to know: What does it do?

Read more