Cloudera Developer Blog · Pig Posts
The Development track at Hadoop World is a technical deep dive dedicated to discussion about Apache Hadoop and application development for Apache Hadoop. You will hear committers, contributors and expert users from various Hadoop projects discuss the finer points of building applications with Hadoop and the related ecosystem. The sessions will touch on foundational topics such as HDFS, HBase, Pig, Hive, Flume and other related technologies. In addition, speakers will address key development areas including tools, performance, bringing the stack together and testing the stack. Sessions in this track are for developers of all levels who want to learn more about upcoming features and enhancements, new tools, advanced techniques and best practices.
This is a guest post contributed by Dmitriy Ryaboy (@squarecog) and was originally published in his blog on December 19th. We thought the information was valuable enough that it was worth reposting to spread the word even further.
Our vision for Hadoop World is a conference where both newcomers and experienced Hadoop users can learn and be part of the growing Hadoop community.
We are also offering training sessions for newcomers and experienced Hadoop users alike. Whether you are looking for an Introduction to Hadoop, Hadoop Certification, or you want to learn more about related Hadoop projects we have the training you are looking for.
With the recent release of CDH3b2, many users are more interested than ever to try out Cloudera’s Distribution for Hadoop (CDH). One of the questions we often hear is, “what does it take to migrate?”.
If you’re not familiar with CDH3b2, here’s what you need to know.
Announcing Two New Training Classes from Cloudera: Introduction to HBase and Analyzing Data with Hive and Pig
Cloudera is pleased to announce two new training courses: a one-day Introduction to HBase and a two-day session on Analyzing Data with Hive and Pig. These join a recently-expanded two-day Hadoop for Administrators course and our popular three-day Hadoop for Developers offering, any of which can be combined to provide extensive, customized training for your organization. Please contact email@example.com for more information regarding on-site training, or visit www.cloudera.com/hadoop-training to view our public course schedule.
Cloudera’s HBase course discusses use-cases for HBase, and covers the HBase architecture, schema modeling, access patterns, and performance considerations. During hands-on exercises, students write code to access HBase from Java applications, and use the HBase shell to manipulate data. Introduction to HBase also covers deployment and advanced features.
Hadoop has emerged as an indispensable component of any data-intensive enterprise infrastructure. In many ways, working with large datasets on a distributed computing platform (powered by commodity hardware or cloud infrastructure) has never been easier. But because customers are running clusters consisting of hundreds or thousands of nodes, and are processing massive quantities of data from production systems every hour, the logistics of efficient platform utilization can quickly become overwhelming.
To deal with this challenge, the Yahoo! engineering team created Oozie – the Hadoop workflow engine. We are pleased to provide Oozie with Cloudera’s distribution for Hadoop starting with the beta-2 release.
Why create a new workflow system?
CDH3 beta 2 includes Apache Pig 0.7.0, the latest and greatest version of the popular dataflow programming environment for Hadoop. In this post I’ll review some of the bigger changes that went into Pig 0.7.0, describe the motivations behind these changes, and explain how they affect users. Readers in search of a canonical list of changes in this new version of Pig should consult the Pig 0.7.0 Release Notes as well as the list of backward incompatible changes.
The biggest change to appear in Pig 0.7.0 is the complete redesign of the LoadFunc and StoreFunc interfaces. The Load-Store interfaces were first introduced in version 0.1.0 and have remained largely unchanged up to this point. Pig uses a concrete instance of the LoadFunc interface to read Pig records from the underlying storage layer, and similarly uses an instance of the StoreFunc interface when it needs to write a record. Pig provides different LoadFunc and StoreFunc implementations in order to support different storage formats, and since this is a public interface users may provide their own implementations as well.
We’re proud to announce that Cloudera’s Distribution for Hadoop Version 2 (CDH2) is officially released.
We’ve come a long way to get to a production quality release. At the beginning of September we announced the first beta of CDH2. After 6 months of additional testing we announced a release candidate. The release candidate spent over a month hardening in Cloudera’s internal QA process and on a wide variety of customer clusters. CDH2 is now stable and ready for use – we are pleased to recommend it to all our production users.
In September 2009, we announced the first release of CDH2, our current testing repository. Packages in our testing repository are recommended for people who want more features and are willing to upgrade as bugs are worked out. Our testing packages pass unit and functional tests but will not have the same “soak time” as our stable packages. A testing release represents a work in progress that will eventually be promoted to stable. It’s a long road of feedback, bug fixes, QA and testing to move from testing to stable. As someone who tracks the maturity of a testing build throughout its life cycle, I’m pleased to say we’ve put a lot of polish into this release.