Cloudera Engineering Blog · CDH Posts
Our thanks to Telvis Calhoun, Zach Hanif, and Jason Trost of Endgame for the guest post below about their BinaryPig application for large-scale malware analysis on Apache Hadoop. Endgame uses data science to bring clarity to the digital domain, allowing its federal and commercial partners to sense, discover, and act in real time.
Our thanks to Concurrent Inc. for the how-to below about using Cascading Pattern with CDH. Cloudera recently tested CDH 4.4 with the Cascading Compatibility Test Suite verifying compatibility with Cascading 2.2.
Cascading Pattern is a machine-learning project within the Cascading development framework used to build enterprise data workflows. Cascading provides an abstraction layer on top of Apache Hadoop and other computing topologies that allows enterprises to leverage existing skills and resources to build data processing applications on Hadoop, without the need for specialized Hadoop skills.
In software development, there is no substitute for having choices. Furthermore, freedom of choice – between frameworks, APIs, and languages — is a major fuel source for platform adoption across any successful ecosystem.
In the case of development on CDH, the open source core of Cloudera’s Big Data platform containing Apache Hadoop and related ecosystem projects, the choices have expanded dramatically in the past three weeks:
We are pleased to announce the beta release of Cloudera Enterprise 5 (CDH 5 and Cloudera Manager 5). This release has both Cloudera Impala and Cloudera Search integrated into CDH. It also includes many new features and updated component versions including the ones below:
The following guest post is provided by Artur Barseghyan, a web developer currently employed by Goldmund, Wyldebeast & Wunderliebe in The Netherlands.
Python is my personal (and primary) programming language of choice and also happens to be the primary programming language at my company. So, when starting to work with a new technology, I prefer to use a clean and easy (Pythonic!) API.
In December 2012, we described how an internal application built on CDH called Cloudera Support Interface (CSI), which drastically improves Cloudera’s ability to optimally support our customers, is a unique and instructive use case for Apache Hadoop. In this post, we’ll follow up by describing two new differentiating CSI capabilities that have made Cloudera Support yet more responsive for customers:
After three months of public beta, and months of private beta before that, Cloudera Search is now generally available. At this milestone, Cloudera has contributed its innovations and IP around the integration of Apache Solr and Apache Lucene with CDH back to the respective upstream projects. The GA of Cloudera Search also signifies the completion of a vast amount of hardening, integration, simplification, and packaging work.
Features of Cloudera Search 1.0 include:
Cloudera and Cisco are announcing a joint solution today, the Cisco Validated Design (CVD) for Cloudera Enterprise reference architecture.
What’s our reasoning behind this new reference architecture?
As announced last Sunday (Aug. 25) on the project mailing list, Apache Hadoop 2.1.0 is the first beta release for Hadoop 2. (See the Release Notes for full list of new features and fixes.) Our congratulations to the Hadoop community for reaching this important milestone in the ongoing adoption of the core Hadoop platform!
With the release of this new beta, and the follow-on GA release on the horizon, we expect to see more customers exploring Hadoop 2 for production use cases. In fact, the upcoming CDH5 beta will be based on the Hadoop 2 GA release, delivering features that we’ve thoroughly tested against enterprise requirements, including (but not limited to):
Apache HBase supports three primary client APIs that developers can use to bind applications with HBase: the Java API, the REST API, and the Thrift API. Therefore, as developers build apps against HBase, it’s very important for them to be aware of the compatibility guidelines with respect to CDH.
This blog post will describe the efforts that go into protecting the experience of a developer using the Java API. Through its testing work, Cloudera allows developers to write code and sleep well at night, knowing that their code will remain compatible through supported upgrade paths.