Skip to main content

Posts

Showing posts with the label mean

Why measures of variability matter: Average age of death in The Olden Days

Alright, this is a 30-second long example for a) bimodal distributions and b) why measures of variability matter when we are trying to understand a mean. And that mean is...AGE OF DEATH. My inspiration for this tweet is: I’m just a girl, standing in front of the internet, asking it to understand that historical life expectancies doesn’t mean most people died at 45 but rather that infant mortality was super high and pulled down the average. — Angelle Haney Gullett (@CityofAngelle) January 12, 2022 Gullett refers here to the commonly held belief that if the mean life span Back In The Day was 45, or thereabout, everyone was dying around 45. NOT SO. Why? The short answer is no. Broadly speaking, there were two choke point of human mortality. Younger than 5, and again around 50. If you made it through those, barring accidents, you likely had what was a normal lifespan of ~65-70 years. And this is why I’m no fun at parties 😂 — Angelle Haney Gullett (@CityofAngelle) January 12, 2022 OK. An...

The Economist: Election predictions, confidence intervals, and measures of central tedency.

The Economist created interactive visualizations for various polling data related to the 2020 U.S. Presidential election.  While illustrating this data, they used different measures of central tendency and different confidence intervals. Like, it is one thing to say that Candidate A is polling at 47% with a margin of error of 3.2%. I think it is much more useful to illustrate what the CI is telling us about the likely true parameter, based on what we have collected from our imperfect sample. The overlap in confidence intervals when polling is essential to understanding polling.  How to use in class: 1) Electoral college predictions, illustrated with median, 60%, and 95% confidence intervals. Also, I like how this illustrates the trade-off between precision and the size of a confidence interval. The 60% CI is more narrow, but you are only 60% confident that it contains the true number of electoral college votes. Meanwhile, the 95% confidence interval is much wide but also more ...

Planet Money's The Modal American

While teaching measures of central tendency in Intro stats, I have shrugged and said: "Yeah, mean and average are the same thing, I don't know why there are two words. Statisticians say mean so we'll say mean in this class." I now have a better explanation than that non-explanation, as verbalized by this podcast: The average is thrown around colloquially and can refer to mode, while mean can always be defined with a formula. This is a fun podcast that describes mode vs. mean, but it also describes the research the rabbit hole we sometimes go down when a seemingly straightforward question becomes downright intractable. Here, the question is: What is the modal American? The Planet Money Team, with the help of FiveThirtyEight's Ben Castlemen, eventually had to go non-parametric and divide people into broad categories and figure out which category had the biggest N. Here is the description of how they divided up : And, like, they had SO MANY CELLS in their des...

Do Americans spend $18K/year on non-essentials?

This is a fine example of using misleading statistics to try and make an argument. USA Today tweeted out this graphic , related to some data that was collected by some firm. There appear to be a number of method issues with this data, so a number of ways to use this in your class: 1) False Dichotomy:  Survey response options should be mutually exclusive. I think there are two types of muddled dichotomies with this data: a) What is "essential"? When my kids were younger, I had an online subscription for diapers. Those were absolutely essential and I received a discount on my order since it was a subscription. However, according to this survey dichotomy, are they an indulgence since they were a subscription that originated online. b) Many purchases fall into multiple categories. Did the survey creators "double-dip" as to pad each mean and push the data towards it's $18K conclusion? Were participants clear that "drinks out with frien...

A bunch of pediatricians swallowed Lego heads. You can use their research to teach the basics of research methods and stats.

As a research-parent-nerd joke before Christmas, six doctors swallowed Lego heads and recorded how long it took to pass the Lego heads. Why? As to inform parents about the lack of danger associated with your kid swallowing a tiny toy.  I encourage you to use it as a class example because it is short, it describes its research methodology very clearly, using a within-subject design, has a couple of means, standard deviations, and even a correlation. TL;DR: https://dontforgetthebubbles.com/dont-forget-the-lego/ In greater detail: Note the use of a within subject design. They also operationalized their DV via the SHAT (Stool Hardness and Transit) scale. *Yeah. So here is the Bristol Stool Chart  mentioned in the above excerpt. Please don't click on the link if your are eating or have a sensitive stomach. Research outcomes, including mean and standard deviations: An example of a non-significant correlation, with the SHAT score on the y-axi...

The Knot's Real Wedding Study 2017

The Knot, a wedding planning website, collected data on the amount of money that brides and grooms spend on items for their weddings. They shared this information, as well as the average cost of a wedding in 2017. See the infographic below: BUT WAIT! If you dig into this data and the methodology, you'll find out that they only collected price points from couples who ACTUALLY PAID FOR THOSE ITEMS. https://xogroupinc.com/press-releases/the-knot-2017-real-weddings-study-wedding-spend/ Problems with this data to discuss with your students: 1) No one who got stuff for free/traded for stuff would have their $0 counted towards the average. For example, one of my cousins is a tattoo artist and he traded tattoos for use of a drone for photos of their outdoor wedding. 2) AND...if you didn't USE a service, your $0 wasn't added to their ol' mean value. For example, we had our wedding and reception at the same location, so we spent $0 on a ceremony site. 3) As poi...

My favorite real world stats examples: The ones that mislead with real data.

This is a remix of a bunch of posts. I brought them together because they fit a common theme: Examples that use actual data that researchers collected but still manage to lie or mislead with real data. So, lying with facts. These examples hit upon a number of themes in my stats classes: 1) Statistics in the wild 2) Teaching our students to sniff out bad statistics 3) Vivid examples are easier to remember than boring examples. Here we go: Making Graphs Fox News using accurate data and inaccurate charts to make unemployment look worse than it is. Misleading with Central Tendency The mean cost of a wedding in 2004 might have been $28K...if you assume that all couples used all possible services, and paid for all of the services. Also, maybe the median would have been the more appropriate measure to report. Don't like the MPG for the vehicles you are manufacturing? Try testing your cars under ideal, non-real world conditions to fix that. Then get fined by the EPA. Mis...

Crash Course: Statistics

Crash course website produces brief, informative videos. They are a mix of animation and live action, and cover an array of topics, including statistics. This one is all about measures of central tendency: Here is the listing under their #statistics tag , which includes videos about correlation/causation, data visualization, and variability. And, you know what? This is just a super cool web site, full stop. Here are all of their psychology videos .

Math With Bad Drawing's "Why Not to Trust Statistics"

Bad Math with Drawings has graced us with statistical funnies before (scroll down for the causality coefficient ). Here is another one, a quick guide pointing out how easy it is to lie with descriptive statistics. Here are two of the examples, there are plenty more at Math With Bad Drawings. https://mathwithbaddrawings.com/2016/07/13/why-not-to-trust-statistics/ https://mathwithbaddrawings.com/2016/07/13/why-not-to-trust-statistics/

Annenberg Learner's "Against All Odds"

Holy smokes. How am I just learning about this amazing resource (thanks, Amy Hogan, for the lead) now? The folks over at Annenberg, famous for Zimbardo's Discovering Psychology series, also have an amazing video collection about statistics, called "Against All Odds" . Each video couches a statistical lesson in a story. 1) In addition to the videos , there are student and faculty guides to go along with every video/chapter. I think that using these guides, and instructor could go textbook free. 2) The topics listed approximate an Introduction to Statistics course. https://www.learner.org/courses/againstallodds/guides/faculty.html

Chokshi's "How Much Weed Is in a Joint? Pot Experts Have a New Estimate"

Alright, stick with me. This article is about marijuana dosage  and it provides good examples for how researchers go about quantifying their variables in order to properly study them. The article also highlights the importance of Subject Matter Experts in the process and how one research question can have many stakeholders. As the title states, the main question raised by this article is "How much weed is in a joint?". Why is this so important? Researchers in medicine, addictions, developmental psychology, criminal justice, etc. are trying to determine how much pot a person is probably smoking when most drug use surveys measure marijuana use by the joint. How to use in a statistics class:

If your students get the joke, they get statistics.

Gleaned from multiple sources (FB, Pinterest, Twitter, none of these belong to me, etc.). Remember, if your students can explain why a stats funny is funny, they are demonstrating statistical knowledge. I like to ask students to explain the humor in such examples for extra credit points (see below for an example from my FA14 final exam). Using xkcd.com for bonus points/assessing if students understand that correlation =/= causation What are the numerical thresholds for probability?  How does this refer to alpha? What type of error is being described, Type I or Type II? What measure of central tendency is being described? Dilbert: http://search.dilbert.com/comic/Kill%20Anyone Sampling, CLT http://foulmouthedbaker.com/2013/10/03/graphs-belong-on-cakes/ Because control vs. sample, standard deviations, normal curves. Also,"skewed" pun. If you go to the original website , the story behind this cakes has to do w...

u/dat data's "Why medians > averages [OC] "

Unsettling. But I bet your students won't forget this example of why mean isn't always the best measure of central tendency. While the reddit user labeled this as example median's superiority, you could also use this as an example when mode is useful. As statisticians, we often fall back on to mode when we have categories and median when we have outliers, but sometimes either median or mode can be useful when decimal points don't make a lot of sense. Here is the image and commentary from reddit: And this an IG posting about the data from the same user, Mona Chalabi from fivethirtyeight. I included the Instagram because Chalabi expands a bit more upon the original data she used. https://www.instagram.com/p/BIVKJrcgW51/

Anscombe's Quartet

No, Wikipedia isn't a proper resource for our students to cite. But it is not without merit. For example, I think the information it provides on Anscombe's quartet is very useful. This example provides four data distributions. For each, the means and variances for both the X and Y variables are identical. The correlations between X and Y, and the regression lines, are also identical. This is the descriptive/inferential data that applies to each of the four graphs I have seen variations upon this in textbooks over the years, but typically they just show how different distributions can have the same mean and standard deviation. I think this example goes the extra mile by including r and the regression line. How to use in class: -Graphs aren't for babies. They can be an essential part of understanding your data. -Outliers are bad! -The original data is also included at the Wikipedia entry if you would like your students to create these graphs in class.

An example of when the median is more useful than the mean. Also, Bill Gates.

From Reddit's Instagram...the comments section demonstrates some heart-warming statistical literacy.

Weber and Silverman's "Memo to Staff: Time to Lose a Few Pounds"

Weber and Silverman's article for the Wall Street Journal has lots of good psychy/stats information  ( here is a .pdf of the article if you hit a pay wall ). I think it would also be applicable to health and I/O psychology classes. The graph below summarizes the main point of the article: Certain occupations have a greater likelihood of obesity than others (a good example of means, descriptive statistics, graphs to demonstrate variation from the mean). As such, how can employers go about increasing employee wellness? How does this benefit an organization financially? Can data help an employer decide upon where to focus wellness efforts? The article goes on to highlight various programs implemented by employers in order to increase employee health (including efficacy studies to test the effectiveness of the programs). In addition to the efficacy research example, the article describes how some employers are using various apps in order to collect data about employee health and...

Improper data reporting leads to big EPA fines for Kia/Hyundai

On November 3, 2014, Hyundai and Kia were fined a record-setting $100 million for violating the Clean Air Act. In addition, they were fined for cooking their data and misreporting their fuel economy, using the unethical (cherry-picking) techniques described below by representatives of the federal government: " "One was the use of, not the average data from the tests, but the best data. Two, was testing the cars at the temperature where their fuel economy is best. Three -- using the wrong tire sizes; and four, testing them with a tail wind but then not turning around in the other direction and testing them with a head wind. So I think that speaks to the kinds problems that we saw with Hyundai and Kia that resulted in the mismeasurement." Video and quote from Sam Hirsch, acting assistant attorney general.    Here is EPA's press release about the fine .  How to use it in class: -Hyundai and Kia cherry-picked data, picking out the most flattering data but not the...