Cloudera Engineering Blog · Community Posts

Introducing Cloudera Labs: An Open Look into Cloudera Engineering R&D

Cloudera Labs contains ecosystem innovations that one day may bring developers more functionality or productivity in CDH.

Since its inception, one of the defining characteristics of Apache Hadoop has been its ability to evolve/reinvent and thrive at the same time. For example, two years ago, nobody could have predicted that the formative MapReduce engine, one of the cornerstones of “original” Hadoop, would be marginalized or even replaced. Yet today, that appears to be happening via Apache Spark, with Hadoop becoming the stronger for it. Similarly, we’ve seen other relatively new components, like Impala, Apache Parquet (incubating), and Apache Sentry (also incubating), become widely adopted in relatively short order.

This Month in the Ecosystem (September 2014)

Welcome to our 13th edition of “This Month in the Ecosystem,” a digest of highlights from September 2014 (never intended to be comprehensive; for that, see the excellent Hadoop Weekly).

Community Meetups during Strata + Hadoop World 2014

The meetup opportunities during the conference week are more expansive than ever — spanning Impala, Spark, HBase, Kafka, and more.

Strata + Hadoop World 2014 is a kaleidoscope of experiences for attendees, and those experiences aren’t contained within the conference center’s walls. For example, the meetups that occur during the conf week (which is concurrent with NYC DataWeek) are a virtual track for developers — and with Strata + Hadoop World being bigger than ever, so is the scope of that track.

This Month in the Ecosystem (August 2014)

Welcome to our 12th (first annual!) edition of “This Month in the Ecosystem,” a digest of highlights from August 2014 (never intended to be comprehensive; for that, see the excellent Hadoop Weekly).

Running CDH 5 on GlusterFS 3.3

The following post was written by Jay Vyas (@jayunit100) and originally published in the Gluster.org Community.

I have recently spent some time getting Cloudera’s CDH 5 distribution of Apache Hadoop to work on GlusterFS 3.3 using Distributed Replicated 2 Volumes. This is made possible by the fact that Apache Hadoop has a pluggable filesystem architecture that allows the computational components within the CDH 5 distribution to be configured to use alternative filesystems to HDFS. In this case, one can configure CDH 5 to use the Hadoop FileSystem plugin for GlusterFS (glusterfs-hadoop), which allows it to run on GlusterFS 3.3. I’ve provided a diagram below that illustrates the CDH 5 core processes and how they interact with GlusterFS.

Progress Report: Cloudera Community Forums After One Year

Cloudera Community forums are proving their value as an important contributor to a rich user experience.

It’s been almost exactly one year since the debut of the Cloudera Community forums. In addition to doing the birthday shout-out, I thought it would be interesting to bring you up to date about adoption and usage patterns.

This Month in the Ecosystem (June 2014)

Welcome to our 10th edition of “This Month in the Ecosystem,” a digest of highlights from June 2014 (never intended to be comprehensive; for that, see the excellent Hadoop Weekly).

Pretty busy for early Summer:

Apache Hive on Apache Spark: Motivations and Design Principles

Two of the most vibrant communities in the Apache Hadoop ecosystem are now working together to bring users a Hive-on-Spark option that combines the best elements of both.

Apache Hive is a popular SQL interface for batch processing and ETL using Apache Hadoop. Until recently, MapReduce was the only execution engine in the Hadoop ecosystem, and Hive queries could only run on MapReduce. But today, alternative execution engines to MapReduce are available — such as Apache Spark and Apache Tez (incubating).

Where to Find Cloudera Tech Talks (Through September 2014)

Find Cloudera tech talks in Texas, Oregon, Washington DC, Illinois, Georgia, Japan, and across the SF Bay Area during the next calendar quarter.

Below please find our regularly scheduled quarterly update about where to find tech talks by Cloudera employees – this time, for the third calendar quarter of 2014 (July through September; traditionally, the least active quarter of the year). Note that this list will be continually curated during the period; complete logistical information may not be available yet. And remember, many of these talks are in “free” venues (no cost of entry).

This Month in the Ecosystem (April 2014)

Welcome to our eighth edition of “This Month in the Ecosystem,” a digest of highlights from April 2014 (never intended to be comprehensive; for completeness, see the excellent Hadoop Weekly).

More good news!

Newer Posts Older Posts