Cloudera Developer Blog · Hadoop Posts

Developer Happy Hour with Cloudera: Building Hadoop 2 Applications

Join us at Cloudera’s San Francisco office on Feb. 20 for tech talks, T-shirts, and adult refreshments!

As an extension of the DeveloperWeek Conf & Festival 2014 experience in San Francisco next month, join us at Cloudera’s San Francisco office for a Developer Happy Hour (beer + tech talks), focusing on Apache Hadoop 2 application development. Anyone (attendees or non) is free to attend, but RSVP now because seats (and “Data is the New Bacon” T-shirts) are limited!

The Hadoop FAQ for Oracle DBAs

Oracle DBAs, get answers to many of your most common questions about getting started with Hadoop.

As a former Oracle DBA, I get a lot of questions (most welcome!) from current DBAs in the Oracle ecosystem who are interested in Apache Hadoop. Here are few of the more frequently asked questions, along with my most common replies.

Top 10 Blog Posts of 2013

From Python, to ZooKeeper, to Impala, to Parquet, blog readers in 2013 were interested in a variety of topics.

Clouderans and guest authors from across the ecosystem (LinkedIn, Netflix, Concurrent, Etsy, Stripe, Databricks, Oracle, Tableau, Alteryx, Talend, Twitter, Dell, Concurrent, SFDC, Endgame, MicroStrategy, Hazy Research, Wibidata, StackIQ, ZoomData, Damballa, Mu Sigma) published prolifically on the Cloudera Developer blog in 2013, with more than 250 new posts — basically, averaging one per business day.

This Month in the Ecosystem (November 2013)

Welcome to our fifth edition of “This Month in the Ecosystem,” a digest of highlights from November 2013 (never intended to be comprehensive; for completeness, see the excellent¬†Hadoop Weekly).

With the holidays upon us, the news in November was sparse. Even so, the ecosystem never stops churning!

Managing Multiple Resources in Hadoop 2 with YARN

An overview of some of Cloudera’s contributions to YARN that help support management of multiple resources, from multi resource scheduling in the Fair Schedule to node-level enforcement

As Apache Hadoop become ubiquitous, it is becoming more common for users to run diverse sets of workloads on Hadoop, and these jobs are more likely to have different resource profiles. For example, a MapReduce distcp job or Cloudera Impala query that does a simple scan on a large table may be heavily disk-bound and require little memory. Or, an Apache Spark (incubating) job executing an iterative machine-learning algorithm with complex updates may wish to store the entire dataset in memory and use spurts of CPU to perform complex computation on it.

Things For Which We Are Thankful

Some things for which we are thankful, the 2013 edition (not listed in order):

1. The entire Apache Hadoop community for its constant and hard work to Make the Platform Better,

Approaches to Backup and Disaster Recovery in HBase

Get an overview of the available mechanisms for backing up data stored in Apache HBase, and how to restore that data in the event of various data recovery/failover scenarios

With increased adoption and integration of HBase into critical business systems, many enterprises need to protect this important business asset by building out robust backup and disaster recovery (BDR) strategies for their HBase clusters. As daunting as it may sound to quickly and easily backup and restore potentially petabytes of data, HBase and the Apache Hadoop ecosystem provide many built-in mechanisms to accomplish just that.

Putting Spark to Use: Fast In-Memory Computing for Your Big Data Applications

Our thanks to Databricks, the company behind Apache Spark (incubating), for providing the guest post below. Cloudera and Databricks recently announced that Cloudera will distribute and support Spark in CDH. Look for more posts describing Spark internals and Spark + CDH use cases in the near future.

BinaryPig: Scalable Static Binary Analysis Over Hadoop

Our thanks to Telvis Calhoun, Zach Hanif, and Jason Trost of Endgame for the guest post below about their BinaryPig application for large-scale malware analysis on Apache Hadoop. Endgame uses data science to bring clarity to the digital domain, allowing its federal and commercial partners to sense, discover, and act in real time.

This Month in the Ecosystem (October 2013)

Welcome to our fourth edition of “This Month in the Ecosystem,” a digest of highlights from October 2013 (never intended to be comprehensive; for completeness, see Hadoop Weekly).

For generating sheer excitement, that month installed a high bar to meet in the future:

Newer Posts Older Posts