Skip to main content

Posts

Hyperbole and a Half's "Boyfriend doesn't have ebola. Probably. "

I've been using this example in class for a few years but never got around to blogging about it until now. It seems that the first chapter of every statistics class provides a boring explanation of what a variable is, and examples of variables, and operationalizing variables, and quantifying the abstract for the purposes of conducting statistical analyses. I try to make that boring topic funnier and applicable to real life via this post entitled "Boyfriend doesn't have ebola. Probably." from Allie Brosh, editor of Hyperbole and a Half . In this posting, she rips apart the good old FACES scale after a trip with her boyfriend to the ER.

If your students get the joke, they get statistics.

Gleaned from multiple sources (FB, Pinterest, Twitter, none of these belong to me, etc.). Remember, if your students can explain why a stats funny is funny, they are demonstrating statistical knowledge. I like to ask students to explain the humor in such examples for extra credit points (see below for an example from my FA14 final exam). Using xkcd.com for bonus points/assessing if students understand that correlation =/= causation What are the numerical thresholds for probability?  How does this refer to alpha? What type of error is being described, Type I or Type II? What measure of central tendency is being described? Dilbert: http://search.dilbert.com/comic/Kill%20Anyone Sampling, CLT http://foulmouthedbaker.com/2013/10/03/graphs-belong-on-cakes/ Because control vs. sample, standard deviations, normal curves. Also,"skewed" pun. If you go to the original website , the story behind this cakes has to do w...

xkcd's Linear Regression

http://xkcd.com/1725/ This comic is another great example of allowing your student to demonstrate statistical comprehension by explaining why a comic is funny. What does the r^2 indicate? When would it be easy to guess the direction of the correlation?  More on that via this previous blog post .

Kristoffer Magnusson's "Interpreting Confidence Intervals"

I have shared Kristoffer Magnusson's fantastic visualizations of statistical concepts here previously ( correlation , Cohen's d ). Here is another one that helps to explain confidence intervals , and how the likelihood of an interval containing true mu varies based on interval size as well as the size of the underlying sample. The site is interactive in two ways. 1) The sliding bar at the top of the page allows you to adjust the size of the confidence interval, which you can read in the portion of the page labeled "CI coverage %" or directly above the CI ticker. See below. 2) You can also change the n-size for the samples the simulation is pulling. The site also reports back the number of samples that include mu and the number of samples that miss mu (wee little example for Type I/Type II error). How to use it in class: Students will see how intervals increase and decrease in size as you reset the CI percentage. As the sample size increases, the range ...

Matt, Rali & Rhonda's Statistical Test Flowchart.

Take a look at this interactive, statistical decision making flow chart. I think that almost every statistics text includes a flow chart, but the interactive piece of this, and its ability to immediately provide the reader with information on the appropriate analysis AND software assistant is something your students can't get from paper versions of same. The flow chart is based on Andy Field's work. I discovered this tool via Reddit. I'm including that Reddit thread because the person that created the thread (commentor4) states that they also created the flow chart. So, you are lead through a series of questions (read this from the bottom up). After you provide the necessary information, the page provides you with a quick definition of the test you should conduct as well as links to instruction using popular statistical packages.

Everything is fucked: The syllabus, by Sanjay Srivastava (with links to articles)

This syllabus for  PSY 607: Everything is Fucked ,  made the rounds last week. The syllabus is for a course that  purports  that science is fucked. The course readings are a list of articles and books that hit on the limitations of statistics and research psychology ( p -values, shortcomings of meta-analysis, misuse of mediation, replication crisis, etc.). PSY 607 isn't an actual class ( as author/psychologist/blogger Srivastava explains in this piece from The Chronicle ) but it does provide a fine reading list for understanding some of the current debates and changes in statistics and psychology.  Most of articles are probably too advanced for undergraduates but perfectly appropriate for teaching graduat e students about our field and staying up to date as instructors of statistics. Here is a link to the original blog post/syllabus. 

Harris' "How Big A Risk Is Acetaminophen During Pregnancy?"

This study, which found a link between maternal Tylenol usage during pregnancy and ADHD, has been making the rounds, particularly in the Academic Mama circles I move in. Being pregnant is hard. For just about every malady, the only solution is to stay hydrated. With a compromised bladder. But at least pregnant women have Tylenol for sore hips and bad backs. For a long time, this has been the only safe OTC pain reliever available to pregnant women. But a recent research article has cast doubt on this advice. A quick read of this article makes it sound like you are cursing your child with a lifetime of ADHD if you take Tylenol. A nd this article has become click-bait fodder. But these findings have some pretty big caveats.  Harris published this reaction piece at NPR . It is a good teaching example of media hype vs. incremental scientific progress and the third (or fourth or fifth) variable problem. It also touches on absolute vs. relative risk. NOTE: There are well-documente...

Ahn Le's "Gotta plot ‘em all!"

This example is a little out of my wheel house, but I'm putting it up here for those of you who teach more advanced UG stats or grad stats. I have never taught Principle Component Analysis. But Anh Le, PhD candidate at Duke, provides a detailed description of PCA in R AND does so using data that your advanced undergraduate/graduate students will enjoy: Pokemon.  So, Le downloaded data for each of the 151 Pokemon (individual stats for the strengths and weakness of each Pokemon, and provided a link so that you can download the data as well). He even included the code he used to create his PCA via R AND he does a nice job talking the reader through his process and what the findings mean. At 37, I didn't realize how much my traditionally-aged college students love Pokemon. Pokemon came up in my undergraduate I/O class three years ago, and I was shocked by how much nostalgia my then-20 year old students felt for the franchise. I think that it is certainly experiencing a rev...

u/dat data's "Why medians > averages [OC] "

Unsettling. But I bet your students won't forget this example of why mean isn't always the best measure of central tendency. While the reddit user labeled this as example median's superiority, you could also use this as an example when mode is useful. As statisticians, we often fall back on to mode when we have categories and median when we have outliers, but sometimes either median or mode can be useful when decimal points don't make a lot of sense. Here is the image and commentary from reddit: And this an IG posting about the data from the same user, Mona Chalabi from fivethirtyeight. I included the Instagram because Chalabi expands a bit more upon the original data she used. https://www.instagram.com/p/BIVKJrcgW51/

Anscombe's Quartet

No, Wikipedia isn't a proper resource for our students to cite. But it is not without merit. For example, I think the information it provides on Anscombe's quartet is very useful. This example provides four data distributions. For each, the means and variances for both the X and Y variables are identical. The correlations between X and Y, and the regression lines, are also identical. This is the descriptive/inferential data that applies to each of the four graphs I have seen variations upon this in textbooks over the years, but typically they just show how different distributions can have the same mean and standard deviation. I think this example goes the extra mile by including r and the regression line. How to use in class: -Graphs aren't for babies. They can be an essential part of understanding your data. -Outliers are bad! -The original data is also included at the Wikipedia entry if you would like your students to create these graphs in class.

Wilson's "America’s Mood Map: An Interactive Guide to the United States of Attitude"

Here is a great example of several different topics, featuring an engaging, interactive m ap created by Time magazine AND using data from a Journal of Personality and Social Psycholog y article . Essentially, the authors of the original article gave the Big Five personality scale to folks all over the US. They broke down the results by state. Then Time created an interactive map of the US in order to display the data. http://time.com/7612/americas-mood-map-an-interactive-guide-to-the-united-states-of-attitude/ How to use in class:

Data USA

Data USA draws upon various federal data sources in order to generate visualizations about cities and occupations in the US. And it provides lots of good examples of simple, descriptive statistics and data visualizations. This website is highly interactive and you can query information about any municipality in the US. This creates relevant, customized examples for your class. You can present examples of descriptive statistics using the town or city in which your college/university/high school is located or you could encourage students to look up their own hometowns. Data provided includes job trends, crime, health care, commuting times, car ownership rates...in short, all sorts of data. Below I have included some screen shots for data about Erie, PA, home of Gannon University: The background photo here is from the Presque Isle, a very popular state park in Erie, PA. And, look, medians!

Quealy & Sanger-Katz's "Is Sushi ‘Healthy’? What About Granola? Where Americans and Nutritionists Disagree"

UPDATE, 9/22/22: Here is a non-paywalled link to this information:  https://www.nytimes.com/2017/10/09/learning/whats-going-on-in-this-graph-oct-10-2017.html This article from the NYT is based on a survey . That survey asked a bunch of nutritionists if they considered certain foods healthy. Then they asked a bunch of everyday folks if they considered the same foods to be healthy. Then they generated the percentage of each group that considered the food healthy. And the NYT put the nutritionist responses on a Y-axis, and commoners on the X, and made a lovely scatterplot... Nutritionists and non-nutritionists agree that chocolate chip cookies are not healthy. However, nutritionists are far more critical of American cheese than are non-nutritionists.  ...and provided us with the raw data as well.