Skip to main content

Posts

Bad data viz: The White House and a rogue y-axis

 My favorite examples of bad data visualizations are the ones that use accurate data that was actually collected through seemingly ethical means but totally malign the data. The numbers are correct, the data viz is...not very truthy ( I'm looking at you, Florida. ) Especially when you mess up the data viz in a way that appears to be deliberate AND doesn't really strengthen your point. I'm also looking at you, The White House. Here is a story of a deliberate but pointless massaging of a y-axis. A story in Three Tweets. 1. The Biden Administration is doing a good job of encouraging economic growth, right? Take a gander at this bar graph. 2021 was a success...just look at the chart.  2. BUT WAIT. What's this? That y-axis is shady. I...just can't think of any software/glitch that could make this mistake by accident. ALSO: If you like Twitter, follow Graph Crimes.  3. The White House issues a correction featuring a pretty good data put, I would say.  FIN

Why measures of variability matter: Average age of death in The Olden Days

Alright, this is a 30-second long example for a) bimodal distributions and b) why measures of variability matter when we are trying to understand a mean. And that mean is...AGE OF DEATH. My inspiration for this tweet is: I’m just a girl, standing in front of the internet, asking it to understand that historical life expectancies doesn’t mean most people died at 45 but rather that infant mortality was super high and pulled down the average. — Angelle Haney Gullett (@CityofAngelle) January 12, 2022 Gullett refers here to the commonly held belief that if the mean life span Back In The Day was 45, or thereabout, everyone was dying around 45. NOT SO. Why? The short answer is no. Broadly speaking, there were two choke point of human mortality. Younger than 5, and again around 50. If you made it through those, barring accidents, you likely had what was a normal lifespan of ~65-70 years. And this is why I’m no fun at parties 😂 — Angelle Haney Gullett (@CityofAngelle) January 12, 2022 OK. An...

Correlation example: Taco Bell and mortality by state...don't run for the border!

Many thanks to my colleague, Andrew Caswell, for sharing this Reddit post with me: https://www.reddit.com/r/dataisbeautiful/comments/s75sm7/oc_us_life_expectancy_vs_of_taco_bell_locations/ So, this alone is an excellent example of correlation and the third variable problem. But...more delightfully, the Redditor who created this graph also shared where he found this data (https://www.nicerx.com/fast-food-capitals/, https://worldpopulationreview.com/state-rankings/life-expectancy-by-state). BETTER STILL: I downloaded and organized all of the fast-food data and mortality data and put it in one spreadsheet for you all. Do All The Correlations! Teach your students about Bonferroni corrections! Figure out the fast-food restaurant that correlates the most strongly with mortality!   PS: Did you know that there is an option to download data from a website in Excel?  The fast-food data was presented in an embedded, scrolly table, and that Excel option made it easy-peasy to do...

Adam Ruben's How to Read a Scientific Paper

One of the nine stages of reading a scientific paper.   Ahahaha. This article by Adam Ruben , writing for Science, makes fun of how difficult it is to read scientific papers. I think your students will enjoy it, especially halfway through their senior thesis/project. It pairs nicely with more traditional guides on reading scientific papers, like the classic Jordan and Zanna piece or this more recent blog post from Dr. Jennifer Raff . It captures some real gems. Peer review, while imperfect, still scrutinizes research papers like no other form of publication: It also contains some niche humor your students may not appreciate, but I do.

JAMA visual abstracts: A great way to illustrate basic inferential tests

So, the Journal of the American Medical Academy publishes v isual abstracts  for some of its research articles. I've written about them before (in particular, this example that illustrates an ANOVA ). These abstracts succinctly summarize the research. They feel like an infographic but contain all of the main sections of a research paper. They are great. They quickly relate the most essential parts of a research study and have a home in Intro Stats.  I love them in Psych Stats and use them for several reasons. 1. Using medical examples reminds Psych Stats students that Psych Stats is really Stats Stats, and stats are used everywhere. 2. These are simplified real-world examples. JAMA creates these to help highlight essential facts for journalists and the public, so Intro Stats students are more than ready to take these on. 3. I like to use these as a quick review of some of the inferential tests we teach in stats. This is no guarantee that basic stats were used in the project, b...

Use this caffeine study to teach repeated measure design, ANOVA, etc.

Twitter is my muse. This blog post was inspired by this Tweet:    In a study comparing blood concentrations of caffeine after coffee or energy drink consumption, blood caffeine levels peaked at about 60 minutes in all conditions. Plan accordingly. https://t.co/cWWakGGtHe pic.twitter.com/c5Nn3x3w1f — Kevin Bass (@kevinnbass) November 30, 2021 This study is straightforward to follow. I, personally, think it is psych-friendly because it is about how a drug affects the body. However, it doesn't require much psych theory knowledge to follow this example. Sometimes I'm worried that when we try too many theory-heavy examples in stats class, we're muddying the waters by expecting too much from baby statisticians who are also baby psychologists. Anyway. Here are some things you can draw out of this example: 1. Factors and levels in ANOVA The factor and levels are easy to identify for students. They can also relate to these examples. I wonder if they used Bang energy drinks? They a...

Marc Rummy/Flowingdata illustration of base rate fallacy as it applies to breakthrough infections

Flowingdata is great. They create lots of exciting data visualizations and share other people's visualizations.  This visualization from Flowingdata is especially significant.   I think it illustrates base rate fallacies beautifully. Moreover, it is applied to a very crucial issue: Immunizations. The base rate fallacy has been used repeatedly to attack the efficacy of vaccines . In particular, instances when vaccinated people catch diseases for which they have been vaccinated. Frequently, such arguments fail to consider base rate data regarding how many more people are vaccinated.  This illustration from Marc Rummy is elegant and straightforward and explains a mathy/sampling/statsy concept without any actual math. I love it.  Also, this illustration has been updated recently with a bit more text to explain everything: Apparently this picture I made that was part of a post 4 months ago recently went viral. Here's a new & improved version that includes the explana...