Author Archives: Doug Cutting

Cloudera Search: The Newest Hadoop Framework for CDH Users and Developers

Categories: CDH General Hadoop Search

One of the unexpected pleasures of open source development is the way that technologies adapt and evolve for uses you never originally anticipated.

Seven years ago, Apache Hadoop sprang from a project based on Apache Lucene, aiming to solve a search problem: how to scalably store and index the internet. Today, it’s my pleasure to announce Cloudera Search, which uses Lucene (among other things) to make search solve a Hadoop problem: how to let non-technical users interactively explore and analyze data in Hadoop.

Read More

It’s Only Rock and Roll

Categories: CDH Community

It’s only Rock and Roll, but I like it!
           – Mick Jagger

Copyright is having a tough time in the digital age. New copies of music, movies and software can be created at near zero cost. Some wonder whether it still makes sense to ever charge for content. Over the past century large industries have developed that sell content. These industries resist change. We consumers love our content, but don’t love paying for it.

Read More

Seven Thoughts on Hadoop’s Seventh Birthday

Categories: Community Hadoop

On this special April 1 – the seven-year anniversary of the Apache Hadoop project’s first release – Hadoop founder Doug Cutting (also Cloudera’s chief architect and the Apache Software Foundation chair) offers seven thoughts on Hadoop:

  1. Open source accelerates adoption.

    If Hadoop had been created as proprietary software it would not have spread as rapidly. We’ve seen incredible growth in the use of Hadoop.

Read More

Data Interoperability with Apache Avro

Categories: Avro

The ecosystem around Apache Hadoop has grown at a tremendous rate. Folks now can use many different pieces of software to process their large data sets, and most choose to use several of these components. Data collected by Flume might be analyzed by Pig and Hive scripts. Data imported with Sqoop might be processed by a MapReduce program. To facilitate these and other scenarios, data produced by each component must be readily consumed by other components.

Read More