Skip to main content

u/zonination's "Got ticked off about skittles posts, so I decided to make a proper analysis for /r/dataisbeautiful [OC]"

The subreddit s/dataisbeautiful was inundated by folks creating color distributions for bags of candy. And because 1) it is Reddit and 2) stats nerds take joy in silly things, candy graphing got out of hand. See below:

https://www.reddit.com/r/dataisbeautiful/comments/5bojxl/oc_the_data_suggests_that_certain_colors_are_not/

https://www.reddit.com/r/dataisbeautiful/comments/5bmo3a/color_distribution_of_one_more_partysized_bag_of/
https://www.reddit.com/r/dataisbeautiful/comments/5cmemr/a_pie_chart_of_mm_colors_from_a_single_500g_bag_oc/

And because it is Reddit, and, to be a fair, statistically unreliable, other posters would claim that this data WASN'T beautiful because it was a small sample size and didn't generalize. One bag of Skittles, they claimed. didn't tell you a lot about the underlying population of Skittles.

Until Redditor zonination came along, bought 35 enormous bags of Skittles, and meticulously documented the color distribution in each bag. He used R. and created multiple data visualizations. See below. Here is the Reddit post, and here is his Imgur gallery with visualizations and a narrative describing his findings. (Y'all, I know Reddit has a bad reputation at times, but the discussion in this posting is hilarious if you are a stats nerd. Check it out.).

He explained his data with a heat map...
http://imgur.com/gallery/uy3MN
And a stacked bar chart, that really illustrates outlier bags 15 and 16. Imagine if you mistakenly tried to generalize from of those bags?
http://imgur.com/gallery/uy3MN
And he presents the increasingly popular Violin plot.
http://imgur.com/gallery/uy3MN


AND...he shared his data and R code with the world.

How to use in class:
-Discuss proper sample sizes required in order to generalize to a population. I think rouge bags 15 and 16 are especially effective at demonstrating sample error.
-Your students understand the concept of Skittles. Therefore, they will be able to understand the nuances of these different kinds of data visualizations.
-Buy your students some Skittles and replicate them.
-Data and code are available to play around with.

Jess Slide Time (added 10/2/18):

Here are some slides I threw together in order to use this as a conceptual example of the sampling distribution of the sample mean:






Comments

  1. Your idea to use from M&M' is very interesting. I enjoy from seeing your blog's pictures. Thank you very much.

    ReplyDelete

Post a Comment

Popular posts from this blog

Ways to use funny meme scales in your stats classes

Have you ever heard of the theory that there are multiple people worldwide thinking about the same novel thing at the same time? It is the multiple discovery hypothesis of invention . Like, multiple great minds around the world were working on calculus at the same time. Well, I think a bunch of super-duper psychology professors were all thinking about scale memes and pedagogy at the same time. Clearly, this is just as impressive as calculus. Who were some of these great minds? 1) Dr.  Molly Metz maintains a curated list of hilarious "How you doing?" scales.  2) Dr. Esther Lindenström posted about using these scales as student check-ins. 3) I was working on a blog post about using such scales to teach the basics of variables.  So, I decided to create a post about three ways to use these scales in your stats classes:  1) Teaching the basics of variables. 2) Nominal vs. ordinal scales.  3) Daily check-in with your students.  1. Teach your students the basics...

Using pulse rates to determine the scariest of scary movies

  The Science of Scare project, conducted by MoneySuperMarket.com, recorded heart rates in participants watching fifty horror movies to determine the scariest of scary movies. Below is a screenshot of the original variables and data for 12 of the 50 movies provided by MoneySuperMarket.com: https://www.moneysupermarket.com/broadband/features/science-of-scare/ https://www.moneysupermarket.com/broadband/features/science-of-scare/ Here is my version of the data in Excel format . It includes the original data plus four additional columns (so you can run more analyses on the data): -Year of Release -Rotten Tomato rating -Does this movie have a sequel (yes or no)? -Is this movie a sequel (yes or no)? Here are some ways you could use this in class: 1. Correlation : Rotten Tomato rating does not correlate with the overall scare score ( r = 0.13, p = 0.36).   2. Within-subject research design : Baseline, average, and maximum heart rates are reported for each film.   3. ...

If your students get the joke, they get statistics.

Gleaned from multiple sources (FB, Pinterest, Twitter, none of these belong to me, etc.). Remember, if your students can explain why a stats funny is funny, they are demonstrating statistical knowledge. I like to ask students to explain the humor in such examples for extra credit points (see below for an example from my FA14 final exam). Using xkcd.com for bonus points/assessing if students understand that correlation =/= causation What are the numerical thresholds for probability?  How does this refer to alpha? What type of error is being described, Type I or Type II? What measure of central tendency is being described? Dilbert: http://search.dilbert.com/comic/Kill%20Anyone Sampling, CLT http://foulmouthedbaker.com/2013/10/03/graphs-belong-on-cakes/ Because control vs. sample, standard deviations, normal curves. Also,"skewed" pun. If you go to the original website , the story behind this cakes has to do w...