Cloudera Engineering Blog · CDH Posts

Cloudera Enterprise 5 Beta 2 is Available: More New Features and Components

Cloudera has released the Beta 2 version of Cloudera Enterprise 5 (comprises CDH 5.0.0 and Cloudera Manager 5.0.0). 

This release (download) contains a number of new features and component versions including the ones below:

Spark is Now Generally Available for Cloudera Enterprise

Cloudera is announcing the general availability of support for Spark, bringing interactive machine learning and stream processing to enterprise data hubs.

Cloudera is pleased to announce the immediate availability of its first release of Apache Spark for Cloudera Enterprise (comprising CDH and Cloudera Manager).

How Wajam Answers Business Questions Faster With Hadoop

Thanks to Xavier Clements of Wajam for allowing us to re-publish his blog post about Wajam’s Hadoop experiences below!

Wajam is a social search engine that gives you access to the knowledge of your friends. We gather your friends’ recommendations from Facebook, Twitter, and other social platforms and serve these back to you on supported sites like Google, eBay, TripAdvisor, and Wikipedia.

How-to: Create a Simple Hadoop Cluster with VirtualBox

Set up a CDH-based Hadoop cluster in less than an hour using VirtualBox and Cloudera Manager.

Thanks to Christian Javet for his permission to republish his blog post below!

Accumulo Comes to CDH

Apache Accumulo is now generally available on CDH 4.

Cloudera is pleased to announce the immediate availability of its first release of Accumulo packaged to run under CDH, our open source distribution of Apache Hadoop and related projects and the foundational infrastructure for Enterprise Data Hubs.

What’s New in Cloudera Manager 5?

Learn the new features and enhancements in Cloudera Manager 5, including support for YARN, management of third-party apps and frameworks, and more.

The response to the Oct. 2013 release of Cloudera Enterprise 5 Beta has been overwhelming, and Cloudera is busily working closely with several customers to incorporate their feedback.

Write MapReduce Jobs in Idiomatic Clojure with Parkour

Thanks to Marshall Bockrath-Vandegrift of advanced threat detection/malware company (and CDH user) Damballa for the following post about his Parkour project, which offers libraries for writing MapReduce jobs in Clojure. Parkour has been tested (but is not supported) on CDH 3 and CDH 4.

Clojure is Lisp-family functional programming language which targets the JVM. On the Damballa R&D team, Clojure has become the language of choice for implementing everything from web services to machine learning systems. One of Clojure’s key features for us is that it was designed from the start as an explicitly hosted language, building on rather than replacing the semantics of its underlying platform. Clojure’s mapping from language features to JVM implementation is frequently simpler and clearer even than Java’s.

Putting Spark to Use: Fast In-Memory Computing for Your Big Data Applications

Our thanks to Databricks, the company behind Apache Spark (incubating), for providing the guest post below. Cloudera and Databricks recently announced that Cloudera will distribute and support Spark in CDH. Look for more posts describing Spark internals and Spark + CDH use cases in the near future.

BinaryPig: Scalable Static Binary Analysis Over Hadoop

Our thanks to Telvis Calhoun, Zach Hanif, and Jason Trost of Endgame for the guest post below about their BinaryPig application for large-scale malware analysis on Apache Hadoop. Endgame uses data science to bring clarity to the digital domain, allowing its federal and commercial partners to sense, discover, and act in real time.

How-to: Use Cascading Pattern with R and CDH

Our thanks to Concurrent Inc. for the how-to below about using Cascading Pattern with CDH. Cloudera recently tested CDH 4.4 with the Cascading Compatibility Test Suite verifying compatibility with Cascading 2.2.

Cascading Pattern is a machine-learning project within the Cascading development framework used to build enterprise data workflows. Cascading provides an abstraction layer on top of Apache Hadoop and other computing topologies that allows enterprises to leverage existing skills and resources to build data processing applications on Hadoop, without the need for specialized Hadoop skills.

Newer Posts Older Posts