Cloudera recently announced the general availability of CDH4.1, an update to our open-source, enterprise-ready distribution of Apache Hadoop and related projects. Among various components, Apache Mahout is a relatively recent addition to CDH (first added to CDH3u2 in 2011), but is already attracting increasing interest out in the field.
Mahout started as a sub-project of Apache Lucene to provide machine-learning libraries in the area of clustering and classification. It later evolved into a top-level Apache project with much broader coverage of machine-learning techniques (clustering,
[Updated Nov. 26, 2012: Sorry, this event has reached capacity and is now closed.]
Please join us in New York on Nov. 29, 2012, for a unique opportunity to hear from industry icons Jeff Hammerbacher (@hackingdata), Amr Awadallah (@awadallah) and Josh Wills (@josh_wills) as they discuss their approach to Data Science and how it transformed business for companies like Facebook, Yahoo! and Google. You will also hear more about Cloudera Enterprise: The Platform for Big Data powered by Cloudera Impala,
Last week at Strata + Hadoop World 2012, we announced a new data science training and certification program. I am very excited to have been part of the team that put the program together, and I would like to answer some of the most frequently asked questions about the course and the certification that we will be offering.
Why is Cloudera offering data science training?
The primary bottleneck on the success of Hadoop is the number of people who are capable of using it effectively to solve business problems.
We at Cloudera are tremendously excited by the power of data to effect large-scale change in the healthcare industry. Many of the projects that our data science team worked on in the past year originated as data-intensive problems in healthcare, such as analyzing adverse drug events and constructing case-control studies. Last summer, we announced that our Chief Scientist Jeff Hammerbacher would be collaborating with the Mt. Sinai School of Medicine to leverage large-scale data analysis with Apache Hadoop for the treatment and prevention of disease.
You may have noticed that Harvard Business Review is calling data science “the sexiest job of the 21st century.” So our answer to the question is: Hot. Definitely hot.
If you need an explanation, watch the “Definition of a Data Scientist” talk embedded below from Cloudera data science director Josh Wills, which was hosted by Cloudera partner Lilien LLC recently in Portland, Ore. The key take-away is,