Meet the Data Scientist: Alan Paulsen

Categories: Data Science Training

Meet Alan Paulsen, among the first to earn the CCP: Data Scientist distinction.

Big Data success requires professionals who can prove their mastery with the tools and techniques of the Apache Hadoop stack. However, experts predict a major shortage of advanced analytics skills over the next few years. At Cloudera, we’re drawing on our industry leadership and early corpus of real-world experience to address the Big Data talent gap with the Cloudera Certified Professional (CCP) program.

As part of this blog series, we’ll introduce the proud few who have earned the CCP: Data Scientist distinction. Featured today is CCP-04, Alan Paulsen. You can start on your own journey to data science and CCP:DS with Cloudera’s new Data Science Challenge on Detecting Anomalies in Medicare Claims.

What’s your current role?

I am the Senior Hadoop Specialist at, creators of the massively popular World of Tanks (78 million players) and World of Warplanes. I’m part of the Global Business Intelligence team located in Austin, Texas. My responsibilities include architecting and developing Hadoop/Big Data solutions and handing them off to our talented groups of data scientists and analysts located in Texas, Belarus, and around the world. 

Prior to taking CCP:DS, what was your experience with Big Data, Hadoop, and data science?

I come from a more traditional data engineer and data warehousing background, having learned both Ralph Kimball’s and Bill Inmon’s methodologies. I’ve always gravitated more towards software development, so Hadoop was a natural path for me to take when I encountered it a few years ago. It’s great to be able to dive into the code and see how things work.

Cloudera’s open-source distribution made it easy to get started with the Hadoop ecosystem, and I began learning while getting my hands dirty with large data sets. The openness and ease-of-use of CDH with Cloudera Manager also made it easy to go down the rabbit hole and, invariably, dive really deep into the myriad applications of Hadoop and its related projects. 

As far as data science, I’ve had to wear many hats in the past, but mathematics and statistics have always been weak points for me. When I saw the Cloudera Certified Professional: Data Scientist program, I knew this was just the thing to get me moving out of the Danger Zone, face my challenges head-on, and take my skills to the next level.

What’s most interesting about data science, and what made you want to become a data scientist?

I love data, engineering, and exploration! What better way to combine all of this than with data science? Hadoop can almost be like a magic hat at times, and being able to use that power to exact real change with my work is very gratifying. This extends beyond my professional work and into my personal life as well. Data is everywhere, and we now have the tools available to make sense of it all. 

Here at, we are performing cutting-edge analysis on user telemetry data to enhance enjoyment and increase player satisfaction. The volume, variety, and velocity of information we see is very exciting, and presents the company with a unique opportunity to both learn from our customers and convert those insights into better experiences for our users.  

How did you prepare for the Data Science Essentials exam and CCP:DS? What advice would you give to aspiring data scientists?

As I mentioned earlier, my biggest area of improvement was mathematics, so I broke out the statistics and linear algebra textbooks. I also reviewed Cloudera’s handy CCP:DS study guide and spent plenty of time at the library with the recommended reading. Mahout in Action was a great book to help bridge the gap between machine learning and Hadoop – it’s also co-authored by Sean Owen, who is the co-author of the current Data Science Challenge. After the exam, I found and took the Coursera course on machine learning, which was absolutely fantastic. I have also heard good things about Cloudera’s Data Analyst Training and Introduction to Data Science: Building Recommender Systems

Regarding the lab, my advice is that you make sure to allocate your time appropriately. I started late and wound up working six weekends in a row. Commit to doing the lab and start it right from the beginning.

Since becoming a CCP:DS in November 2013, what has changed in your career and/or in your life?

I started a new position in December 2013 after earning a place in the first class of CCP:DS. However, the biggest change has been the way I approach problems now that I have a better understanding of data science principals. I think there is a big learning curve in getting started, but once I got over the initial hump, it became much easier to advance my understanding and skillset. 

Another exciting aspect of my entrance into this community is that I am also in a much better position to expand and scale the work of other data scientists into Hadoop and Big Data.

Why should aspiring data scientists consider taking CCP:DS?

Honestly, it’s fun! Yes, you can learn a lot from reading, tutorials, and case studies. However, getting your hands dirty is truly the best way to learn. Without the Data Science Challenge, it would have been tough getting involved in a data science project, especially coupled with Hadoop. Cloudera provides the data set, environment, problem set (not too rigid), and interaction as a mock customer. 

The industry recognition that comes with being a CCP:DS is just icing on the cake!