Category Archives: Hadoop

5 Pitfalls of Benchmarking Big Data Systems

Categories: Hadoop Performance

Benchmarking Big Data systems is nontrivial. Avoid these traps!

Here at Cloudera, we know how hard it is to get reliable performance benchmarking results. Benchmarking matters because one of the defining characteristics of Big Data systems is the ability to process large datasets faster. “How large” and “how fast” drive technology choices, purchasing decisions, and cluster operations. Even with the best intentions, performance benchmarking is fraught with pitfalls—easy to get numbers,

Read More

For Apache Hadoop, The POODLE Attack Has Lost Its Bite

Categories: CDH Cloudera Manager Hadoop Platform Security & Cybersecurity

A significant vulnerability affecting the entire Apache Hadoop ecosystem has now been patched. What was involved?

By now, you may have heard about the POODLE (Padding Oracle On Downgraded Legacy Encryption) attack on TLS (Transport Layer Security). This attack combines a cryptographic flaw in the obsolete SSLv3 protocol with the ability of an attacker to downgrade TLS connections to use that protocol. The result is that an active attacker on the same network as the victim can potentially decrypt parts of an otherwise encrypted channel.

Read More

Apache Hadoop 2.6 is Released

Categories: Community Hadoop

The Apache Hadoop community has voted to release Hadoop 2.6. Congrats to all contributors!

This new release contains a variety of improvements, particularly in the storage layer and in YARN. We’re particularly excited about the encryption-at-rest feature in HDFS!

Hadoop Common


  • Heterogeneous Storage Tiers –

Read More

The Story of the Cloudera Engineering Hackathon (2014 Edition)

Categories: Cloudera Life Community Hadoop

Cloudera’s culture is premised on innovation and teamwork, and there’s no better example of them in action than our internal hackathon.

Cloudera Engineering doubled-down on its “hackathon” tradition last week, with this year’s edition taking an around-the-clock approach thanks to the HQ building upgrade since the 2013 edition (just look at all that space!).

This year, Cloudera software engineers had 24 straight hours to conceive, build, and present their hacks to a panel of celebrity judges.

Read More

NoSQL in a Hadoop World

Categories: Hadoop HBase Impala

The number of powerful data query tools in the Apache Hadoop ecosystem can be confusing, but understanding a few simple things about your needs usually makes the choice easy. 

Ah, the good old days. I recall vividly that in 2007, I was faced to store 1 billion XML documents and make them accessible as well as searchable. I had few choices on a given shoestring budget: build something one my own (it was the rage back then—and still is),

Read More