As Halloween night quickly approaches, there is only one question on every kid’s mind: how can I maximize my candy haul this year with the best possible candy? This kind of question lends itself perfectly to data science approaches that enable quick and intuitive analysis of data across multiple sources. Using Cloudera Machine Learning, the world’s first hybrid data cloud machine learning tooling, let’s take a deep dive into the world of candy analytics to answer the tough question on everyone’s mind: How do we win Halloween?
So many factors go into obtaining the best possible candy portfolio. First of all it’s all about maximizing the number of doors knocked. This requires a densely populated location. However, this isn’t an option for every trick or treater. For example, I grew up in rural Montana where trick or treating required a car and snowshoes to get to each home (okay, not snowshoes, but definitely snow boots). If you find yourself in this situation, I highly recommend tracking average candy output per home each year. For example, if the Roger’s have handed out king size candy bars every year, it might be worth the extra 10 minute drive.
So far we’ve talked about quantity, but just as important is quality. This variable is largely out of your control, and can be dependent on the region you live in. I recently found out that there are companies that actually track the candy sales by state each year. CandyStore.com is one of these companies (on a side note, check out their website if you have a hankering for rare sweets). They released a blog this year with the results from their annual data mining, it includes the top 3 candies purchased for each state and the quantity purchased in pounds.
Some of the top purchased candies are wild. For example, take my home state of Montana, they purchased over 28 thousand pounds of Dubble Bubble Gum. You read that right, Dubble Bubble Gum, the rock-hard, 4-chews-with-flavor gum that everyone yearns for. Other states are a bit more of what you expect, California knows that no one can resist a classic like the Reeses Peanut Butter Cup.
This got me thinking though, based on this data, there is likely a difference in taste between those buying the candy and those actually consuming it. Is there an easy way that we could identify these candy market imbalances? Luckily, when CML isn’t solving the world’s most ambitious predictive challenges for enterprise businesses, it’s the perfect tool for this kind of agile and ad-hoc data science discovery. To analyze and satisfy our candy questions, I’ll spin up JupyterLab natively in CML and immediately have access to both scalable compute and secure granular data to tackle this challenge in just a few clicks — let’s get started.
How to avoid the bad candy
If we want to find the states that bought “bad candies”, we need some way to quantify consumer taste preferences for various sweets. Enter The Ultimate Halloween Candy Power Ranking from FiveThirtyEight which contains the survey results from over 269,000 randomly generated candy matchups (i.e. do you like candy A or B better). The end result was a win percentage for 86 different mainstream candies.
Now, if we merge these two data sets together by candy name, we’re able to build a visualization that highlights the top purchased candy in each state, and the preference for that candy. The more black a state is, the more disliked the top candy purchased in that state is. When you hover over a state (or tap if you’re on your phone), the first number is the win percentage for the top candy in that state, you’ll also see the name of the candy and the amount of that candy purchased in 2021, according to CandyStore.com.
There are a few things that stick out to me. To no one’s surprise, Montana’s choice of Dubble Bubble is bound to be regretted. FiveThirtyEight has the win percentage for Dubble Bubble at 27%, meaning Montana takes the prize for worst top purchased candy. Not far behind is every state that chose to buy more Candy Corn than anything else. Yes, I’m looking at you New Mexico and North Dakota. Candy Corn’s win percentage is only 38%. So, if you’re a fan of Candy Corn or Dubble Bubble (aka if you have numb taste buds) you now know where to travel this holiday to find a surplus of your favorite disliked candy.
Analysis like these aren’t earth shattering, but not every analysis needs to be. What every analysis should be though is easy to do. Cloudera provides a variety of tools in the Cloudera Data Platform (CDP) that allow you to easily work with your data. If you want to give a tool like CML a try and run your own candy analysis, head over to the CDP Test Drive and take the platform out for a spin.