Apache Hadoop Developer Training Helps Query Massive Telecom Data

Categories: CDH Guest Hive Pig Training Use Case

This guest post is provided by Rohit Menon, Product Support and Development Specialist at Subex.

I am a software developer in Denver and have been working with C#, Java, and Ruby on Rails for the past six years. Writing code is a big part of my life, so I constantly keep an eye out for new advances, developments, and opportunities in the field, particularly those that promise to have a significant impact on software engineering and the industries that rely on it. 

In my current role working on revenue assurance products in the telecom space for Subex, I have regularly heard from customers that their data is growing at tremendous rates and becoming increasingly difficulty to process, often forcing them to portion out data into small, more manageable subsets. The more I heard about this problem, the more I realized that the current approach is not a solution, but an opportunity, since companies could clearly benefit from more affordable and flexible ways to store data. Better query capability on larger data sets at any given time also seemed key to derive the rich, valuable information that helps drive business. Ultimately, I was hoping to find a platform on which my customers could process all their data whenever they needed to. As I delved into this Big Data problem of managing and analyzing at mega-scale, it did not take long before I discovered Apache Hadoop.

Mission: Hands-On Hadoop

My initial reading about Hadoop on the various blogs and forums had me convinced that it is easily one of the best tools out there for handling and processing large volumes of data. At first, I thought I’d be able to learn Hadoop on my own by reading Hadoop: The Definitive Guide and the Hadoop Tutorial from Yahoo! However, after only a few days of reading, it became clear that I would benefit greatly from direct interaction with Hadoop experts, supervised experimentation, and interaction with practical examples of Hadoop challenges from the field. 

Almost all of my research into Hadoop developer training led me to Cloudera University. As I learned more, I was really impressed by Cloudera’s contributions to the Hadoop community, its early entry into the Hadoop support and services space, its unmatched experience serving a recognizable customer base, and the enthusiasm for and popularity of CDH as the world’s leading Hadoop distribution. And all that was even before I realized Cloudera offers public training classes conveniently located in Denver! Moreover, given my aspirations to become a full-time Big Data professional, I knew Cloudera Certified Developer for Apache Hadoop (CCDH) status would be an important next step. After reviewing some of Cloudera’s online tutorial videos, enrollment in Cloudera’s next available public training class felt like the obvious choice.

Training as a Catalyst

Cloudera Developer Training for Apache Hadoop is a four-day course designed to give software developers and engineers a complete understanding of HDFS and MapReduce, the other tools that make up the Apache Hadoop ecosystem, and the skills to develop applications on Hadoop right out the gate. My instructor not only had a true level of expertise in Hadoop and the curriculum, but was able to draw on both his years as an engineer and his time as a dedicated training specialist to give helpful guidance, answer all my questions, and instill a sense that I was getting the best training experience possible.

The course provided me with insights into why Hadoop was built—it should be noted that Hadoop’s inventor and the Chairman of the Apache Software Foundation, Doug Cutting, is also Cloudera’s Chief Architect—and the problems it addresses. Importantly, the classes went beyond the basic curriculum to look at real-world Hadoop cases, which provided a compelling platform on which the labs and hands-on exercises were conducted. Completion of the Cloudera Developer Training and Cloudera Certification not only guarantee entry into the Big Data domain, but also assure a level of expertise and fluency that is nearly impossible to gain anywhere else, particularly in such a short amount of time. I truly felt prepared to use these tools to start working on Big Data analytics.

Productizing with Help from Hive

Based on the knowledge I gained from Cloudera Developer Training, I plan to use Hadoop as a solution to the Big Data problems so many of my customers face, as I call on my training to implement proposed programs that place Hadoop in development at the core of key data management and analytics products. One of the great things about CDH is that, in the case that a developer hasn’t yet learned to run MapReduce jobs, I will still be able to work with analysts to set up and query Hive tables using the SQL-like language that immediately breaks down the bottleneck previously faced when trying to access large quantities of data. For users interested in writing custom code, Pig, the scripting framework for Hadoop, becomes the ideal choice and allows data mining the analysts with whom I work could previously only imagine. I should note that Cloudera also offers Training for Apache Hive and Pig.

It’s clear to me that Hadoop training has not only helped me identify solutions for the most interesting problems I face at work from day to day, but has also put me on the learning path to achieving even loftier goals, not least of which is completing additional training towards gaining my Cloudera Data Scientist Certification.