Skip to main content

Posts

Showing posts with the label median

An interactive that gets your students thinking about medians, percentiles, and their own sleeping habits.

My students struggle with sleeping and are distracted by electronics. This interactive activity allows them to think about their sleep relative to norms regarding age and sex. It also dives deeply into how sleep changes over a person's lifespan, which is a topic suitable for non-static classes like Health or Developmental.   https://www.washingtonpost.com/wellness/interactive/2024/sleep-data-survey-americans/ *You need a WaPo subscription or paywall buster to get to this interactive. Like this one! https://www.removepaywall.com/search?url=https://www.washingtonpost.com/wellness/interactive/2024/sleep-data-survey-americans/ Here is a quick interactive that a) lets your students see how well they sleep, in comparison to their demographic and b) think about median data and percentile data.  1. Repursped, gently used data is really everywhere. This interactive uses data from the Census Bureau. Which is a way to measure sleep, but not the only way. 2. Median and percentil...

Explaining the median using a German game show.

This is a very brief example to spice up the measures of central tendency lecture. There is a game show in Germany, and one of the rounds of the game show is performing a perfect median split on food. OF COURSE, IT IS A BAVARIAN HOT PRETZEL. The "splitting championship" game is part of a larger video game. Here is the YouTube version and here is the Reddit version, with more deets on the game show. To be clear, we aren't talking about eye-balling here. The median split is an exact split by weight. Just as a statistical median split is an exact splitting of a data set. Here is a more exact screen grab:  ALSO: Because I love a good internet rabbit hole, the Reddit source I found actually goes into detail about the German game show. Have fun. 

Organizations sharing data in a way that is very accessible

A few weeks ago, I posted about how you can share data in such a terrible way that one is not breaking the law, but the data is completely unusable. This makes me think of all the times I am irked when someone states a problem but doesn't offer a solution to the problem. Instead, they just talk about what is wrong and not how it could be. So, as a counter piece, let's cheer on organizations that ARE sharing data in a way that is readily accessible. You could use this in class as a palate cleanser if you teach your students about data obfuscation. You could also use it as a way of helping your students understand how data really is everywhere. Or even challenge them to brainstorm an app that uses readily accessible data in a new way to help folks.  Pro-Publica This website lets you check how often salmonella is found at different chicken processing plants. All you need to do is enter the p-number, company, or location listed on your package of chicken: https://projects.propubli...

The Economist: Election predictions, confidence intervals, and measures of central tedency.

The Economist created interactive visualizations for various polling data related to the 2020 U.S. Presidential election.  While illustrating this data, they used different measures of central tendency and different confidence intervals. Like, it is one thing to say that Candidate A is polling at 47% with a margin of error of 3.2%. I think it is much more useful to illustrate what the CI is telling us about the likely true parameter, based on what we have collected from our imperfect sample. The overlap in confidence intervals when polling is essential to understanding polling.  How to use in class: 1) Electoral college predictions, illustrated with median, 60%, and 95% confidence intervals. Also, I like how this illustrates the trade-off between precision and the size of a confidence interval. The 60% CI is more narrow, but you are only 60% confident that it contains the true number of electoral college votes. Meanwhile, the 95% confidence interval is much wide but also more ...

Florida, COVID-19: If data and stats weren't important, Florida wouldn't lie about them.

People I love very much live in Florida. My very favorite academic conference is held in Florida. I want Florida to flatten the curve. But Florida is flattening the curve. Believe me when I say that I'm not trying to dunk on Florida, but Florida has provided me with prime material for statistics teaching. Timely material that illustrates weaponized data. Some examples are more straightforward, like median and poor data visualization. Others illustrate a theme that I cover in my own stats class, a theme that we should all be discussing in our stats class: Data must be very, very powerful if so many large organizations work so hard to discredit it, manipulate it, and fire people who won't. You should also point out to your students that organizations working so hard to discredit are typically straightforward descriptive data, not graduate-level data analysis.  1. Measures of Central Tendency As of June 23, the median age of people newly diagnosed with COVID-19 in Florida dropped ...

My favorite real world stats examples: The ones that mislead with real data.

This is a remix of a bunch of posts. I brought them together because they fit a common theme: Examples that use actual data that researchers collected but still manage to lie or mislead with real data. So, lying with facts. These examples hit upon a number of themes in my stats classes: 1) Statistics in the wild 2) Teaching our students to sniff out bad statistics 3) Vivid examples are easier to remember than boring examples. Here we go: Making Graphs Fox News using accurate data and inaccurate charts to make unemployment look worse than it is. Misleading with Central Tendency The mean cost of a wedding in 2004 might have been $28K...if you assume that all couples used all possible services, and paid for all of the services. Also, maybe the median would have been the more appropriate measure to report. Don't like the MPG for the vehicles you are manufacturing? Try testing your cars under ideal, non-real world conditions to fix that. Then get fined by the EPA. Mis...

Crash Course: Statistics

Crash course website produces brief, informative videos. They are a mix of animation and live action, and cover an array of topics, including statistics. This one is all about measures of central tendency: Here is the listing under their #statistics tag , which includes videos about correlation/causation, data visualization, and variability. And, you know what? This is just a super cool web site, full stop. Here are all of their psychology videos .

Math With Bad Drawing's "Why Not to Trust Statistics"

Bad Math with Drawings has graced us with statistical funnies before (scroll down for the causality coefficient ). Here is another one, a quick guide pointing out how easy it is to lie with descriptive statistics. Here are two of the examples, there are plenty more at Math With Bad Drawings. https://mathwithbaddrawings.com/2016/07/13/why-not-to-trust-statistics/ https://mathwithbaddrawings.com/2016/07/13/why-not-to-trust-statistics/

Yau's "Divorce and Occupation"

Nathan Yau , writing for Flowing Data , provides a good example of correlation, median, and correlation not equaling causation in his story, " Divorce and Occupation ". Yau looked at the relationship between occupation and divorce in a few ways. He used one of variation upon the violin plot to illustrate how each occupation's divorce rate falls around the median divorce rate. Who has the lowest rate? Actuaries. They really do know how to mitigate risk. You could also discuss why median divorce rate is provided instead of mean divorce rate. Again, the actuaries deserve attention as they probably would throw off the mean. https://flowingdata.com/2017/07/25/divorce-and-occupation/ He also looked at  how salary was related to divorce, and this can be used as a good example of a linear relationship: The more money you make, the lower your chances for divorce. And an intuitive exception to that trend? Clergy members.  https://flowingdata.com/2017/07/25/divorce...

Pew Research's "Growing Ideological Consistency"

This interactive tool from Pew research illustrates left and right skew as well as median and longitudinal data. The x-axis indicates how politically consistent (as determined by a survey of political issues) self-identified republicans and democrats are across time. Press the button and you can animate data, or cut up the data so you only see one party or only the most politically active Americans. http://www.people-press.org/2014/06/12/section-1-growing-ideological-consistency/#interactive The data for both political part goes from being normally distributed in 1994 to skewed by 2014. And you can watch what happens to the median as the political winds change (and perhaps remind your students as to why mean would be the less desirable measure of central tendency for this example). I think it is interesting to see the relative unity in political thought (as demonstrated by more Republicans and Democrats indicating mixed political opinions) in the wake of 9/11 but more politicall...

u/dat data's "Why medians > averages [OC] "

Unsettling. But I bet your students won't forget this example of why mean isn't always the best measure of central tendency. While the reddit user labeled this as example median's superiority, you could also use this as an example when mode is useful. As statisticians, we often fall back on to mode when we have categories and median when we have outliers, but sometimes either median or mode can be useful when decimal points don't make a lot of sense. Here is the image and commentary from reddit: And this an IG posting about the data from the same user, Mona Chalabi from fivethirtyeight. I included the Instagram because Chalabi expands a bit more upon the original data she used. https://www.instagram.com/p/BIVKJrcgW51/

Data USA

Data USA draws upon various federal data sources in order to generate visualizations about cities and occupations in the US. And it provides lots of good examples of simple, descriptive statistics and data visualizations. This website is highly interactive and you can query information about any municipality in the US. This creates relevant, customized examples for your class. You can present examples of descriptive statistics using the town or city in which your college/university/high school is located or you could encourage students to look up their own hometowns. Data provided includes job trends, crime, health care, commuting times, car ownership rates...in short, all sorts of data. Below I have included some screen shots for data about Erie, PA, home of Gannon University: The background photo here is from the Presque Isle, a very popular state park in Erie, PA. And, look, medians!

An example of when the median is more useful than the mean. Also, Bill Gates.

From Reddit's Instagram...the comments section demonstrates some heart-warming statistical literacy.

National Geographic's "Are you typical?"

This animated short from National Geographic touches on averages, median, mode, sampling, and the need for cross-cultural research. When defining the typical (modal) human, the video provides good examples of when to use mode (when determining which country has the largest population) and when to use median (median age in the world). It also illustrates the need to collect cross-cultural data before making any broad statements about typicality (when describing how "typical" is relative to a population).

Amanda Aronczyk's "Cancer Patients And Doctors Struggle To Predict Survival"

Warning: This isn't an easy story to listen to, as it is about life expectancy and terminal cancer (and how doctors can best convey such information to their patients). Most of this news story is dedicated to training doctors on the best way to deliver this awful news.   But Aronczyk, reporting for NPR, does tell a story that provides a good example of high-stakes applied statistics . Specifically, when explaining life expectancy to patients with terminal cancer, which measure of central tendency should be used? See the quote from the story below to understand where confusion and misunderstanding can come from measures of central tendency. " The data are typically given as a median, which is different from an average. A median is the middle of a range. So if a patient is told she has a year median survival, it means that half of similar patients will be alive at the end of a year and half will have died. It's possible that the person's cancer will advance quic...

Pew Research's "Global views on morality"

Pew Research went around the globe and asked folks in 40 different countries if a variety of different behaviors qualified as "Unacceptable", "Acceptable", or "Not a moral issue". See below for a broad summary of the findings. Summary of international morality data from Pew The data on this website is highly interactive...you can break down the data by specific behavior, by country, and also look at different regions of the world. This data is a good demonstration of why graphs are useful and engaging when presenting data to an audience. Here is a summary of the data from Pew.  It nicely describes global trends (extramarital affairs are largely viewed as unacceptable, and contraception is widely viewed as acceptable). How you could use this in class. 1) Comparison of different countries and beliefs about what is right, and what is wrong. Good for discussions about multiculturalism, social norms, normative behaviors, the influence of religion ...

Nate Silver and Allison McCann's "How to Tell Someone’s Age When All You Know Is Her Name"

Nate Silver and Allison McCann (reporting for Five Thirty Eight, created graphs displaying baby name popularity over time.  The data and graphs can be used to illustrate bimodality, variability, medians, interquartile range, and percentiles. For example, the pattern of popularity for the name Violet illustrates bimodality and illustrates why measures of central tendency are incomplete descriptors of data sets: "Other names have unusual distributions. What if you know a woman — or a girl — named Violet? The median living Violet is 47 years old. However, you’d be mistaken in assuming that a given Violet is middle-aged. Instead, a quarter of Violets are older than 78, while another quarter are younger than 4. Only about 4 percent of Violets are within five years of 47." Relatedly, bimodality (resulting from the current trend of giving classic, old-lady names to baby girls) can result in massive variability for some names... ...versus trendy baby names th...