Cloudera Developer Blog
Big Data best practices, how-to's, and internals from Cloudera Engineering and the community
The internals of Oozie’s ShareLib have changed recently (reflected in CDH 5.0.0). Here’s what you need to know.
In a previous blog post about one year ago, I explained how to use the Apache Oozie ShareLib in CDH 4. Since that time, things have changed about the ShareLib in CDH 5 (particularly directory structure), so some of the previous information is now obsolete. (These changes went upstream under OOZIE-1619.)
More good news!
Thanks to recent work upstream, YARN is now a highly available service. This post explains its architecture and configuration details.
YARN, the next-generation compute and resource management framework in Apache Hadoop, until recently had a single point of failure: the ResourceManager, which coordinates work in a YARN cluster. With planned (upgrades) or unplanned (node crashes) events, this central service, and YARN itself, could become unavailable.
HBaseCon 2014 is in the books. Thanks to all attendees, speakers, and sponsors!
The new Python client for Impala will bring smiles to Pythonistas!
As a data scientist, I love using the Python data stack. I also love using Impala to work with very large data sets. But things that take me out of my Python workflow are generally considered hassles; so it’s annoying that my main options for working with Impala are to write shell scripts, use the Impala shell, and/or transfer query results by reading/writing local files to disk.
Thanks to Jonathan Natkins of WibiData for the post below about how his company extended Cloudera Manager to manage Kiji. Learn more about Kiji and the organizations using it to build real-time HBase applications at Kiji Sessions, happening on May 6, 2014, the day after HBaseCon.
As a partner of Cloudera, WibiData sees Cloudera Manager’s new extensibility framework as one of the most exciting parts of Cloudera Enterprise 5. Cloudera Manager 5.0.0 provides the single-pane view that Apache Hadoop administrators and operators want to effectively manage a cluster of machines. Additionally, Cloudera Manager now offers tight integration for partners to plug into the CDH ecosystem, which benefits Cloudera as well as WibiData.
More than 300 bug fixes and stable features in Apache Hive 0.13 have already been backported into CDH 5.0.0.
Last week, the Hive community voted to release Hive 0.13. We’re excited about the continued efforts and progress in the project and the latest release — congratulations to all contributors involved!
Thanks to Alexander Rubin of Percona for allowing us to re-publish the post below!
Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from MySQL to Hadoop, load the data to Cloudera Impala (columnar format), and run reporting on top of that. For the examples below, I will use the “ontime flight performance” data from my previous post.
In this installment of “Meet the Engineer”, our subject is Andrei Savu!
What do you do at Cloudera?
Understanding some key differences between MR1 and MR2/YARN will make your migration much easier.
Here at Cloudera, we recently finished a push to get Cloudera Enterprise 5 (containing CDH 5.0.0 + Cloudera Manager 5.0.0) out the door along with more than 100 partner certifications.