Monday, December 26, 2016

u/zonination's "Got ticked off about skittles posts, so I decided to make a proper analysis for /r/dataisbeautiful [OC]"

The subreddit s/dataisbeautiful was inundated by folks creating color distributions for bags of candy. And because 1) it is Reddit and 2) stats nerds take joy in silly things, candy graphing got out of hand. See below:

https://www.reddit.com/r/dataisbeautiful/comments/5bojxl/oc_the_data_suggests_that_certain_colors_are_not/

https://www.reddit.com/r/dataisbeautiful/comments/5bmo3a/color_distribution_of_one_more_partysized_bag_of/
https://www.reddit.com/r/dataisbeautiful/comments/5cmemr/a_pie_chart_of_mm_colors_from_a_single_500g_bag_oc/

And because it is Reddit, and, to be a fair, statistically unreliable, other posters would claim that this data WASN'T beautiful because it was a small sample size and didn't generalize. One bag of Skittles, they claimed. didn't tell you a lot about the underlying population of Skittles.

Until reditor zonination came along, bought 35 enormous bags of Skittles, and went to work meticulously documenting the color distribution in each bag. And he used R. And he created multiple data visualizations. See below. Here is the reddit post, and here is his Imgur gallery with visualizations and a narrative describing his findings. (Y'all, I know Reddit has a bad reputation at times, but the discussion in this posting is hilarious if you are a stats nerd. Check it out.).

He explained his data with a heat map...
http://imgur.com/gallery/uy3MN
And a stacked bar chart, that really illustrates outlier bags 15 and 16. Imagine if you mistakenly tried to generalize from of those bags?
http://imgur.com/gallery/uy3MN
And he presents the increasingly popular Violin plot.
http://imgur.com/gallery/uy3MN


...as well as the perpetual favorite, a bar chart with error bars.
http://imgur.com/gallery/uy3MN




AND...he shared his data and R code with the world.

How to use in class:
-Discuss proper sample sizes required in order to generalize to a population. I think rouge bags 15 and 16 are especially effective at demonstrating sample error.
-Your students understand the concept of Skittles. Therefore, they will be able to understand the nuances of these different kinds of data visualizations.
-Buy your students some Skittles and replicate.
-Data and code available to play around with.

Monday, December 19, 2016

Kevin McIntyre's Open Stats Lab

Dr. Kevin McIntryre from Trinity University has created the Open Stats Lab. OSL provides users with research articles, data sets, and worksheets for studies that illustrate statistical tests commonly taught in Introduction to Statistics.

Topics covered, illustrated beautifully by Natalie Perez

All of his examples come from Open Science Framework-compliant publications from Psychological Science. McIntyre presents the OSF data (in SPSS but .CSV files are available), the original research article, AND a worksheet to go along with each article.

Layout for each article/data set/activity. This article demonstrates one-way ANOVA.

I know, right? I think it can be difficult to find 1) research an UG can follow that 2) contains simple data analyses. And here, McIntryre presents it all. This project was funded by a grant from APS.

Wednesday, December 14, 2016

A wintery mix of holiday data.




Property of @JenSacco54


http://www.huffingtonpost.com/entry/mariah-carey-christmas_us_561f989be4b0c5a1ce621a69
A wintery example of why range is a crap measure of variability


http://qz.com/859303/americas-most-common-christmas-related-injuries-in-charts/


Monday, December 12, 2016

Wilson's "Find Out What Your British Name Would Be"

Students love personalized, interactive stuff. This website from Chirs Wilson over at Time allows your American students to enter their name and they recieve their British statistical doppleganger name in return. Or vice versa.


And by statistical doppleganger, I mean that the author sorted through name popularity databases in the UK and America. He then used a Least Squared Error model in order to find strong linear relationships for popularity over time between names.

How to use in class:
Linear relationship
LSE
Trends over time

Monday, December 5, 2016

Aschwanden's "You Can’t Trust What You Read About Nutrition"


Fivethirtyeight provides lots of beautiful pictures of spurious correlations found by their own in-house study.
At the heart of this article are the limitations of a major tool use in nutritional research, the Food Frequency Questionnaire (FFQ). The author does a mini-study, enlisting the help of several co-workers and fivethirtyeight.com readers. They track track their own food for a week and reflect on how difficult it is to properly estimate and recall food (perhaps a mini-experiment you could do with your own students?).

And she shares the spurious correlations she found in her own mini-research:



Aschwanden also discusses how much noise and lack of consensus their is in real, published nutritional research (a good argument for why we need replication!): 

http://fivethirtyeight.com/features/you-cant-trust-what-you-read-about-nutrition/

How to use in class:
-Short comings of survey research, especially survey research that relies on accurate memories
-Spurious correlations (and p-values!)
-Correlation does not equal causation
-Why replication is necessary


Also included is an amusing video that shows what it is like to be a participant in a nutrition study. It details the FFQ, or Food Frequency Questionnaire. And the video touches on serving sizes and portions, how how it may be difficult for many of to properly estimate (per the example) how many cups of spare ribs we consume per week.