Cloudera Engineering Blog · Distribution Posts

CDH4.1 Now Released!

Update time!  As a reminder, Cloudera releases major versions of CDH, our 100% open source distribution of Apache Hadoop and related projects, annually and then updates to CDH every three months.  Updates primarily comprise bug fixes but we will also add enhancements.  We only include fixes or enhancements in updates that maintain compatibility, improve system stability and still allow customers and users to skip updates as they see fit.

We’re pleased to announce the availability of CDH4.1.  We’ve seen excellent adoption of CDH4.0 since it went GA at the end of June and a number of exciting use cases have moved to production.  CDH4.1 is an update that has a number of fixes but also a number of useful enhancements.  Among them:

Apache Flume Development Status Update

Apache Flume is a scalable, reliable, fault-tolerant, distributed system designed to collect, transfer, and store massive amounts of event data into HDFS. Apache Flume recently graduated from the Apache Incubator as a Top Level Project at Apache. Flume is designed to send data over multiple hops from the initial source(s) to the final destination(s). Click here for details of the basic architecture of Flume. In this article, we will discuss in detail some new components in Flume 1.x (also known as Flume NG), which is currently on the trunk branch, techniques and components that can be be used to route the data, configuration validation, and finally support for serializing events.

In the past several months, contributors have been busy adding several new sources, sinks and channels to Flume. Flume now supports Syslog as a source, where sources have been added to support Syslog over TCP and UDP.

Apache Bigtop 0.3.0 (incubating) has been released

Apache Bigtop 0.3.0 (incubating) is now available. This is the first fully integrated, community-driven, 100% Apache Big Data management distribution based on Apache Hadoop 1.0. In addition to a major change in the Hadoop version, all of the Hadoop ecosystem components have been upgraded to the latest stable versions and thoroughly tested:

Introducing CDH4

I’m pleased to inform our users and customers that Cloudera has released its 4th version of Cloudera’s Distribution Including Apache Hadoop (CDH) into beta today. This release combines the input from our enterprise customers, partners and users with the hard work of Cloudera engineering and the larger Apache open source community to create what we believe is a compelling advance for this widely adopted platform.

There are a great many improvements and new capabilities in CDH4 compared to CDH3. Here is a high level list of what’s available for you to test in this first beta release:

Apache Flume – Architecture of Flume NG

This blog was originally posted on the Apache Blog: https://blogs.apache.org/flume/entry/flume_ng_architecture

Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. Flume is currently undergoing incubation at The Apache Software Foundation. More information on this project can be found at http://incubator.apache.org/flume. Flume NG is work related to new major revision of Flume and is the subject of this post.

Hadoop World 2011: A Glimpse into Development

The Development track at Hadoop World is a technical deep dive dedicated to discussion about Apache Hadoop and application development for Apache Hadoop. You will hear committers, contributors and expert users from various Hadoop projects discuss the finer points of building applications with Hadoop and the related ecosystem. The sessions will touch on foundational topics such as HDFS, HBase, Pig, Hive, Flume and other related technologies. In addition, speakers will address key development areas including tools, performance, bringing the stack together and testing the stack. Sessions in this track are for developers of all levels who want to learn more about upcoming features and enhancements, new tools, advanced techniques and best practices.

Preview of Development Track Sessions

Evolution of Hadoop Ecosystem: AOL Advertising Experience

Pero works on research and development in new technologies for online advertising at Aol Advertising R&D in Palo Alto. Over the past 4 years he has been the Chief Architect of R&D distributed ecosystem comprising more than thousand nodes in multiple data centers. He also led large-scale contextual analysis, segmentation and machine learning efforts at AOL, Yahoo and Cadence Design Systems and published patents and research papers in these areas.

A critical premise for success of online advertising networks is to successfully collect, organize, analyze and use large volumes of data for decision making. Given the nature of their online orientation and dynamics, it is critical that these processes be automated to the largest extent possible.

Cloudera Service and Configuration Manager Express Edition Screencast (Director’s Cut)

Philip Zeyliger is a software engineer at Cloudera and started the SCM
project.

Two weeks ago, at Hadoop Summit, we released our Service and Configuration Manager (SCM) Express. It’s a dramatically simpler and faster way to get started with Cloudera’s Distribution including Apache Hadoop (CDH). In a previous blog post, we talked in some detail about SCM Express and what it can do for you.

SCM Express: Now Anyone Can Experience the Power of Apache Hadoop

Phil Langdale is a software engineer at Cloudera and the technical lead for Cloudera’s SCM Express product.

What is SCM Express?

The Only Full Lifecycle Management for Apache Hadoop: Introducing Cloudera Enterprise 3.5 and SCM Express

Drew O’Brien is a product marketing manager at Cloudera

We’re excited to share the news about the immediate availability of Cloudera Enterprise 3.5 and SCM Express, which we announced this week in tandem with our presence at Hadoop Summit. These products represent a major advance in Cloudera’s mission to drive massive enterprise adoption of 100% open source Apache Hadoop. We now make it easier and more convenient than ever before for companies to run and manage Apache Hadoop clusters throughout their entire operational lifecycle.

Older Posts