Meet the Instructor: Bruce Martin
In this installment of “Meet the Instructor”, our interview subject is Bruce Martin.
What is your role at Cloudera?
What do you enjoy most about training and/or curriculum development?
I enjoy teaching the concepts of Apache Hadoop and helping students get some practical experience using the technology. I very much enjoy leading organized discussions that allow students to learn from my experiences and the experiences of others. I’ve been architecting, designing, and building advanced software systems for almost 30 years. I worked on distributed, replicated filesystems in the 1980s, distributed objects in the 1990s, and application frameworks in the 2000s. When I first learned about Hadoop, I was impressed how well it solved today’s Big Data business problems by creatively combining the algorithms and frameworks my colleagues and I had worked on in the past.
In return for sharing my insights from the field, I get to learn from students about a lot of different verticals—healthcare, education, finance, energy, regulatory agencies, advertising, insurance, banking—their Big Data challenges, and the opportunities presented by Hadoop, the enterprise data hub, and data science.
Describe an interesting application you’ve seen or heard about for Hadoop.
For me, the most interesting Hadoop applications all have a machine-learning component to them. I am thrilled by the idea that an application can learn from data in order to predict future events and use those predictions to better our lives. All of the collected experiences of my career in technology, from the toil and joy of writing code to studying computer science to eventually teaching thousands of students in the classroom, feels truly worth it when I see folks building machine learning applications that enhance outcomes in healthcare and education, protect us from fraud, support efficient use of energy, and so on.
A common example of machine learning is an email spam filter, which learns how to effectively identify and separate out junk email based on examples of spam. Over time, the filter automatically and independently improves by observing the characteristics of more and more emails categorized as spam or not spam by users. Although the program doesn’t necessarily read or understand emails, it is able to use the data that make up emails to predict which qualify as spam and which do not.
What advice would you give to an engineer, system administrator, or analyst who wants to learn more about Big Data?
For those who are completely new to Hadoop, first focus on the concepts. The model of parallel, distributed functional computing is fundamentally different from the sequential computing models most of us learned in school. Learn the concepts before worrying about the many, many details of Hadoop and its ecosystem.
After mastering the concepts, get your hands dirty. If you are a developer, data analyst, or data scientist, find a Big Data set (there are many publicly available data sets if you don’t have access to your own) and build an application. If you are an administrator, build a cluster. Cloudera Express is free to download, as is the QuickStart VM, which makes it easy to spin up a cluster for experimentation. Cloudera even offers a free trial of the Cloudera Enterprise Data Hub Edition, which provides the full suite of all the state-of-the art tools used in production environments.
Our Cloudera University courses are structured to maximize learning-by-doing while teaching you the best practices leveraged from real-world engagements with customers using Hadoop in all different kinds of environments and with different use cases. They present concepts, provide lots of hands-on labs, and provide students with the opportunity to benefit from the experiences of the instructor and fellow students.
How did you become involved in technical training and Hadoop?
Prior to joining Cloudera, I was the product architect for Student Success Products at SunGard Higher Education (now Ellucian). Most recently, I worked on a product that used machine learning to identify students at risk of failing or dropping out of a course in a university. The product builds a model from historical data about students who previously took a course. The data include student demographic profiles, academic preparation, performance, effort, and final grades. The model then uses the data to predict the success or risk of currently enrolled students.
However, the product didn’t operate on Big Data sets. The amount of available data for students taking a particular course is not that large by today’s standards. But the product was envisioned to evolve to operate on data from students in an entire department, or a school, or a university, or a university system, or a state, or even an entire country. Those data sets were definitely going to be BIG! As the architect, I was required to prepare our systems, products, and teams for the inevitable future of massive and multi-structured data. I learned about the emerging Big Data technologies and experimented with how we would apply them in education. I became fascinated with Hadoop and its ecosystem, and I jumped when I had the opportunity to teach at Cloudera.
What’s one interesting fact or story about you that a training participant would be surprised to learn?
There really are too many to list, but here’s a bunch of teasers:
- According to a priest in the Azores, I am a descendant of Prince Henry the Navigator.
- I’ve been speaking Spanish most of my life, and l live part-time in Mexico.
- Although most of my classes are taught in English, I’ve conducted Spanish-language courses and workshops in Mexico, Spain, and Chile.
- As an undergraduate at University of California, Berkeley, I often encountered a graduate student who pretty much lived in the computer room. Turns out it was Bill Joy (eventual co-founder of Sun Microsystems), who was building Berkeley UNIX at the time.
- In graduate school at University of California, San Diego, back in the 1980s, I built replicated filesystems and experimented with voting algorithms.
- I am the author of five CORBA Services specifications at the Object Management Group.
- I used to have a job title Director, Advanced Concepts – not just concepts, but advanced ones!