Markov Chain Monte Carlo methods are another example of useful statistical computation for Big Data that is capably enabled by Apache Spark.
During my internship at Cloudera, I have been working on integrating PyMC with Apache Spark. PyMC is an open source Python package that allows users to easily apply Bayesian machine learning methods to their data, while Spark is a new, general framework for distributed computing on Hadoop.
Welcome to our 11th edition of “This Month in the Ecosystem,” a digest of highlights from July 2014 (never intended to be comprehensive; for that, see the excellent Hadoop Weekly).
- An early release of the new O’Reilly Media book, Hadoop Application Architectures, became available. This one is sure to become standard bookshelf material. (Look for signed copies at Strata + Hadoop World!)
- Continuuity introduced Tephra,
Get started with Apache Hadoop and use-case examples online in just seconds.
Today, we announced the Cloudera Live Read-Only Demo, a new online service for developers and analysts (currently in public beta) that makes it easy to learn, explore, and try out CDH, Cloudera’s open source software distribution containing Apache Hadoop and related projects. No downloads, no installations, no waiting — just point-and-play!
Try the Cloudera Live Read-Only Demo
The Cloudera Live Read-Only Demo is a live CDH 5 cluster with a Hue interface (based on Hue 3.5.0,
This quick demo illustrates how easy it is to implement role-based access and control in Impala using Sentry.
Apache Sentry (incubating) is the Apache Hadoop ecosystem tool for role-based access control (RBAC). In this how-to, I will demonstrate how to implement Sentry for RBAC in Impala. I feel this introduction is best motivated by a use case.
Data warehouse optimization is one of the most common Hadoop use cases.
Sure, Spark is fast, but it also gives developers a positive experience they won’t soon forget.
Apache Spark is well known today for its performance benefits over MapReduce, as well as its versatility. However, another important benefit – the elegance of the development experience – gets less mainstream attention.
In this post, you’ll learn just a few of the features in Spark that make development purely a pleasure.