On this special April 1 – the seven-year anniversary of the Apache Hadoop project’s first release – Hadoop founder Doug Cutting (also Cloudera’s chief architect and the Apache Software Foundation chair) offers seven thoughts on Hadoop:
- Open source accelerates adoption.
If Hadoop had been created as proprietary software it would not have spread as rapidly. We’ve seen incredible growth in the use of Hadoop. Partly that’s because it’s useful. But many would have been cautious to make a vendor-controlled platform part of their infrastructure, useful or not.
- Apache builds collaborative communities.
The Hadoop ecosystem has hundreds of developers working for tens of organizations. Competitors productively collaborate on a daily basis, improving the software we all share. The Apache Software Foundation gives us the methodology that enables this. (Thanks, Apache!)
- The timing is right.
Folks flock to Hadoop not just because it is open-source and works, but also because it fills a need. Moore’s law provides us with a bounty of affordable hardware. This has led to computing devices spreading through our world. Cars and tractors have computers. Phones, and cash registers and more have become computers. Data flows through each of these. Hadoop gives us tools to save and analyze more of this data, improving our understanding of the world.
- Random names are good names.
When we started out, I had no idea what Hadoop would become, so I proposed a name for it that didn’t have any connotation. The project has grown, giving that name meaning. Not everyone may pronounce the word “Hadoop” the same, but we all know what it is. A whimsical name also helps remind us to have fun.
- People love a story.
People love to hear me tell the story of my son’s toy elephant. They love to see that toy. The story has become the mythological prehistory of Hadoop. I guess every movement needs its origin story!
- Hadoop is a phase transition in computing.
Hadoop’s developers did not invent distributed computing, nor is Hadoop its most advanced form, but Hadoop has brought distributed computing to the mainstream. Hadoop gets thousands of unreliable computers to work together reliably. As the number of computers grows, we no longer think about them individually but instead as parts of a whole. This is a fundamentally different way of using computers that’s rapidly becoming commonplace.
- The sky’s the limit.
Hadoop was originally created to help build search engines. It’s still used for that, but its uses have grown far beyond that. Its core features have advanced tremendously, but even more dramatic is the range of systems being built on top of Hadoop. From machine learning to real-time queries, Hadoop is becoming a great platform for nearly any task folks imagine involving large amounts of data. The trends that gave rise to Hadoop continue, and Hadoop is evolving and growing to meet new challenges. We are still in the early days of this revolution.
Here’s to the next seven years!