How Syncsort Leverages Training to Optimize Hadoop Scalability

Categories: Guest Hadoop Training

This guest post is provided by Dave Nahmias, Pre-Sales and Partner Solutions Engineer at Syncsort, with an introduction by Patty Crowell, Director of Global Education Services at Syncsort.

Introduction: Training is Key

Apache Hadoop is extremely important to maximizing the value Syncsort’s technology delivers to our customers. That value promise starts with a solid foundation of knowledge and skills among key technical staff across the company.

We chose Cloudera University’s private training option to ensure Syncsort’s cross-functional team of engineering, support, services, and technical sales professionals had the expertise to optimize our data products for the end-user. Because the members of our team had different levels of prior Hadoop experience, the private class enabled us to freely share information and ask tough questions, resulting in a high level of engagement throughout the course.

Everyone benefited tremendously from the experience and attention of Cloudera’s instructor, since the private training was tailored to our particular needs and was set up to promote collaboration during the entire learning process. Lessons ranged from core Hadoop knowledge for the less experienced members of the team all the way up to tight focus on specific roles and skills for those who had been using Hadoop all along, all supported by relevant and challenging labs. Moreover, the opportunity to host the training at our own location according to our own schedule made Cloudera training a highly convenient solution.

Read on for the technical insights gained by a Syncsort engineer.

Why I Attended Cloudera’s Training

I am the technical interface to Syncsort’s partners. I have been working with large data environments for the past 20 years and specifically with Hadoop for the last year-and-a-half. As the first line of defense for a software vendor in the Hadoop marketplace, it is critical that I understand all the inner-workings of the technology. Armed with the proper level of product knowledge, I can give partners and customers a guided tour through the menagerie of animal names associated with the Hadoop ecosystem, helping them to better understand the solution components and to build successful applications. Ultimately, it helps me evangelize the benefits of Big Data solutions. Attending Cloudera’s Developer Training was a no-brainer, since Cloudera is well recognized in the industry as a primary source of Hadoop knowledge and insight.

What Syncsort Brings to the Hadoop Party

Syncsort has been a thought leader in high-efficiency sort solutions on mainframes and open systems for more than 30 years. Syncsort’s unique product architecture allowed us to contribute changes to the core Hadoop code, which allows any sort algorithm (including Syncsort’s) to be called instead of Hadoop’s native sort. This interface also provides a vehicle for Syncsort to integrate our native ETL engine directly into the MapReduce architecture, bringing the graphical development environment, efficiencies, and throughput of native execution across the mainframe, open systems, and Hadoop environments. We extend Hadoop’s flexibility and ease of use, essentially leveling the playing field for any of these environments.

What I Learned

I had acquired most of my prior Hadoop knowledge by reading and experimenting. Having a knowledgeable instructor validate and fine-tune my understanding at a deeper level as well as articulate real-world use cases from his personal experience was incredibly valuable. Although there are plenty of blogs and articles attempting to build the Hadoop knowledge base, I have found they tend to misrepresent how Hadoop actually works, so the full Cloudera training course proved to be both the fastest and the most reliable way to clarify cloudy areas, get first-hand experience, and transition from conceptualization to development with Hadoop.

The details about how Hadoop actually reads and writes data were particularly interesting to me. I knew Hadoop stored multiple replicas of data blocks, but I didn’t previously have any in-depth knowledge of the exact algorithms used or how they relate to scheduling and data integrity with respect to the checksums maintained. It’s also very helpful to understand the difference in scheduling techniques. The course also clarified how the map and reduce tasks are scheduled across nodes to minimize network traffic.

Additionally, in my day-to-day responsibilities, I can leverage what I learned about the way Hadoop clients are configured, the interrelationship of the XML configuration files, and how those files are organized to define the architecture across the cluster. The lessons about how the schedulers work and the impact they might have on various workloads, such as benchmarking, will also be of immediate value. Finally, I will derive immediate benefit from the lessons on the Hadoop client architecture, which gave me a better understanding of how Flume and Sqoop work.

What the Syncsort Team Learned

My colleagues agree the training enabled us to be much more productive when working with Hadoop and to better visualize the possibilities created by Hadoop. Specifically, it helped us understand how Syncsort might leverage the Hadoop architecture in conjunction with our DMExpress software to help our customers extend their usage of Hadoop. It will also help us position Syncsort to contribute to accelerating Hadoop acceptance by providing well-architected Big Data ETL and intra-platform data movement solutions. Even at a fraction of those expected outcomes we will have realized measurable return on investment from Cloudera training.

From a professional standpoint, the knowledge we gained also prepares us for Cloudera Certification, a necessity for anyone who needs to provide tangible evidence of Hadoop expertise in the field. Last, but certainly not least, we believe the training improved our collective ability to intelligently communicate with our partners and customers about Hadoop’s functionality, a skill that should never be underestimated.

Employing the Expanded Skill Set

Data in mainframes and open systems will continue to be important, and integrating that data together with data stored in HDFS will be a challenge for our customers that I believe Syncsort is now well-positioned to address. Thanks to Cloudera training, I can use my increased Hadoop knowledge to provide even better service to my partners and customers and provide more value with a technology that will grow increasingly important to my company’s business.