Cloudera Developer Blog
Big Data best practices, how-to's, and internals from Cloudera Engineering and the community
Understanding some key differences between MR1 and MR2/YARN will make your migration much easier.
Here at Cloudera, we recently finished a push to get Cloudera Enterprise 5 (containing CDH 5.0.0 + Cloudera Manager 5.0.0) out the door along with more than 100 partner certifications.
The HBaseCon 2014 “Case Studies” track surfaces some of the most interesting (and diverse) use cases in the HBase ecosystem — and in the world of NoSQL overall — today.
The HBaseCon 2014 (May 5, 2014 in San Francisco) is not just about internals and best practices — it’s also a place to explore use cases that you not have even considered before.
Get started with Apache Hadoop and use-case examples online in just seconds.
Today, we announced Cloudera Live, a new online service for developers and analysts (currently in public beta) that makes it easy to learn, explore, and try out CDH, Cloudera’s open source software distribution containing Apache Hadoop and related projects. No downloads, no installations, no waiting — just point-and-play!
Our thanks to Prashant Sharma and Matei Zaharia of Databricks for their permission to re-publish the post below about future Java 8 support in Apache Spark. Spark is now generally available inside CDH 5.
One of Apache Spark‘s main goals is to make big data applications easier to write. Spark has always had concise APIs in Scala and Python, but its Java API was verbose due to the lack of function expressions. With the addition of lambda expressions in Java 8, we’ve updated Spark’s API to transparently support these expressions, while staying compatible with old versions of Java. This new support will be available in Spark 1.0.
A Few Examples
Meet Stuart Horsman, among the first to earn the CCP: Data Scientist distinction.
Big Data success requires professionals who can prove their mastery with the tools and techniques of the Hadoop stack. However, experts predict a major shortage of advanced analytics skills over the next few years. At Cloudera, we’re drawing on our industry leadership and early corpus of real-world experience to address the Big Data talent gap with the Cloudera Certified Professional (CCP) program.
Getting started with Spark (now shipping inside CDH 5) is easy using this simple example.
(Editor’s note – this post has been updated to reflect CDH 5.1/Spark 1.0)
Improved scheduling capabilities via Oozie in CDH 5 makes for far fewer headaches.
One of the best new Apache Oozie features in CDH 5, Cloudera’s software distribution, is the ability to use
cron-like syntax for coordinator frequencies. Previously, the frequencies had to be at fixed intervals (every hour or every two days, for example) – making scheduling anything more complicated (such as every hour from 9am to 5pm on weekdays or the second-to-last day of every month) complex and difficult.
The community has voted to release Apache Hadoop 2.4.0.
Hadoop 2.4.0 includes myriad improvements to HDFS and MapReduce, including (but not limited to):
The HBaseCon 2014 “Ecosystem” track offers a cross-section view of the most interesting projects emerging on top of, or alongside, HBase.
Our thanks to Amar Parkash, a Software Developer at Goibibo, a leading travel portal in India, for the enthusiastic support of Hue you’ll read below.
At Goibibo, we use Hue in our production environment. I came across Hue while looking for a near real-time log search tool and got to know about Cloudera Search and the interface provided by Hue. I tried it on my machine and was really impressed by the UI it provides for Apache Hive, Apache Pig, HDFS, job browser, and basically everything in the Big Data domain. We immediately deployed Hue in production, and that has been one of the best decisions we have ever made for our data platform at Goibibo.