u/zonination's "Got ticked off about skittles posts, so I decided to make a proper analysis for /r/dataisbeautiful [OC]"

The subreddit s/dataisbeautiful was inundated by folks creating color distributions for bags of candy. And because 1) it is Reddit and 2) stats nerds take joy in silly things, candy graphing got out of hand. See below:

https://www.reddit.com/r/dataisbeautiful/comments/5bojxl/oc_the_data_suggests_that_certain_colors_are_not/

https://www.reddit.com/r/dataisbeautiful/comments/5bmo3a/color_distribution_of_one_more_partysized_bag_of/
https://www.reddit.com/r/dataisbeautiful/comments/5cmemr/a_pie_chart_of_mm_colors_from_a_single_500g_bag_oc/

And because it is Reddit, and, to be a fair, statistically unreliable, other posters would claim that this data WASN'T beautiful because it was a small sample size and didn't generalize. One bag of Skittles, they claimed. didn't tell you a lot about the underlying population of Skittles.

Until Redditor zonination came along, bought 35 enormous bags of Skittles, and meticulously documented the color distribution in each bag. He used R. and created multiple data visualizations. See below. Here is the Reddit post, and here is his Imgur gallery with visualizations and a narrative describing his findings. (Y'all, I know Reddit has a bad reputation at times, but the discussion in this posting is hilarious if you are a stats nerd. Check it out.).

He explained his data with a heat map...
http://imgur.com/gallery/uy3MN
And a stacked bar chart, that really illustrates outlier bags 15 and 16. Imagine if you mistakenly tried to generalize from of those bags?
http://imgur.com/gallery/uy3MN
And he presents the increasingly popular Violin plot.
http://imgur.com/gallery/uy3MN


AND...he shared his data and R code with the world.

How to use in class:
-Discuss proper sample sizes required in order to generalize to a population. I think rouge bags 15 and 16 are especially effective at demonstrating sample error.
-Your students understand the concept of Skittles. Therefore, they will be able to understand the nuances of these different kinds of data visualizations.
-Buy your students some Skittles and replicate them.
-Data and code are available to play around with.

Jess Slide Time (added 10/2/18):

Here are some slides I threw together in order to use this as a conceptual example of the sampling distribution of the sample mean:






Comments

  1. Your idea to use from M&M' is very interesting. I enjoy from seeing your blog's pictures. Thank you very much.

    ReplyDelete

Post a Comment