Category Archives: MapReduce

Learn How To Hadoop from Tom White in Dr. Dobb’s

Categories: Books Hadoop MapReduce

It’s always a great thing for everybody when the experts are willing and eager to share.

So, it’s with special pleasure that I can point you toward a new three-part series by Cloudera’s own Tom White (@tom_e_white) to be published in Dr Dobb’s, which has long been one of the publications of record in the mainstream developer world – from which many original programmers learned basics like BASIC.

Cloudera ML: New Open Source Libraries and Tools for Data Scientists

Categories: Community Data Science General Mahout MapReduce Tools

Editor’s note (12/19/2013): Cloudera ML has been merged into the Oryx project. The information below is still valid though.

Last month, Apache Crunch became the fifth project (along with Sqoop, Flume, Bigtop, and MRUnit) to go from Cloudera’s github repository through the Apache Incubator and on to graduate as a top-level project within the Apache Software Foundation. As the founder of the project and a newly minted Apache VP,

Read More

How Apache Hadoop Helps Scan the Internet for Security Risks

Categories: Guest MapReduce Security Use Case

The following guest post comes from Alejandro Caceres, president and CTO of Hyperion Gray LLC – a small research and development shop focusing on open-source software for cyber security.

Imagine this: You’re an informed citizen, active in local politics, and you decide you want to support your favorite local political candidate. You go to his or her new website and make a donation, providing your bank account information, name,

Read More

Apache Hadoop 2.0.3-alpha Released

Categories: General Hadoop HDFS MapReduce YARN

Last week the Apache Hadoop PMC voted to release Apache Hadoop 2.0.3-alpha, the latest in the Hadoop 2 release series. This release fixes over 500 issues (covering the Common, HDFS, MapReduce and YARN sub-projects) since the 2.0.2-alpha release in October last year. In addition to bug fixes and general improvements the more noteworthy changes include:

  • HDFS High Availability (HA) can now use a Quorum Journal Manager (QJM) for sharing namenode edit logs (HDFS-3077).

Read More