Cloudera Developer Blog · Hadoop Posts
The new Cloudera Developer Newsletter makes its debut in January 2014.
Developers and data scientists, we’re realize you’re special – as are operators and analysts, in their own particular ways.
Join us at Cloudera’s San Francisco office on Feb. 20 for tech talks, T-shirts, and adult refreshments!
As an extension of the DeveloperWeek Conf & Festival 2014 experience in San Francisco next month, join us at Cloudera’s San Francisco office for a Developer Happy Hour (beer + tech talks), focusing on Apache Hadoop 2 application development. Anyone (attendees or non) is free to attend, but RSVP now because seats (and “Data is the New Bacon” T-shirts) are limited!
Oracle DBAs, get answers to many of your most common questions about getting started with Hadoop.
As a former Oracle DBA, I get a lot of questions (most welcome!) from current DBAs in the Oracle ecosystem who are interested in Apache Hadoop. Here are few of the more frequently asked questions, along with my most common replies.
From Python, to ZooKeeper, to Impala, to Parquet, blog readers in 2013 were interested in a variety of topics.
Clouderans and guest authors from across the ecosystem (LinkedIn, Netflix, Concurrent, Etsy, Stripe, Databricks, Oracle, Tableau, Alteryx, Talend, Twitter, Dell, Concurrent, SFDC, Endgame, MicroStrategy, Hazy Research, Wibidata, StackIQ, ZoomData, Damballa, Mu Sigma) published prolifically on the Cloudera Developer blog in 2013, with more than 250 new posts — basically, averaging one per business day.
Welcome to our fifth edition of “This Month in the Ecosystem,” a digest of highlights from November 2013 (never intended to be comprehensive; for completeness, see the excellent Hadoop Weekly).
With the holidays upon us, the news in November was sparse. Even so, the ecosystem never stops churning!
An overview of some of Cloudera’s contributions to YARN that help support management of multiple resources, from multi resource scheduling in the Fair Schedule to node-level enforcement
As Apache Hadoop become ubiquitous, it is becoming more common for users to run diverse sets of workloads on Hadoop, and these jobs are more likely to have different resource profiles. For example, a MapReduce distcp job or Cloudera Impala query that does a simple scan on a large table may be heavily disk-bound and require little memory. Or, an Apache Spark (incubating) job executing an iterative machine-learning algorithm with complex updates may wish to store the entire dataset in memory and use spurts of CPU to perform complex computation on it.
Some things for which we are thankful, the 2013 edition (not listed in order):
1. The entire Apache Hadoop community for its constant and hard work to Make the Platform Better,
Get an overview of the available mechanisms for backing up data stored in Apache HBase, and how to restore that data in the event of various data recovery/failover scenarios
With increased adoption and integration of HBase into critical business systems, many enterprises need to protect this important business asset by building out robust backup and disaster recovery (BDR) strategies for their HBase clusters. As daunting as it may sound to quickly and easily backup and restore potentially petabytes of data, HBase and the Apache Hadoop ecosystem provide many built-in mechanisms to accomplish just that.
Our thanks to Databricks, the company behind Apache Spark (incubating), for providing the guest post below. Cloudera and Databricks recently announced that Cloudera will distribute and support Spark in CDH. Look for more posts describing Spark internals and Spark + CDH use cases in the near future.
Our thanks to Telvis Calhoun, Zach Hanif, and Jason Trost of Endgame for the guest post below about their BinaryPig application for large-scale malware analysis on Apache Hadoop. Endgame uses data science to bring clarity to the digital domain, allowing its federal and commercial partners to sense, discover, and act in real time.