Skip to main content

Posts

Showing posts from 2024

Truncated Y-axis, but with female celebrities.

Why did I find this after my textbook was published? Damn it. I have a whole section about how Y-axis manipulation can make small differences look huge and then...I find this. Damn it. Source:   https://www.reddit.com/r/dataisugly/comments/1hjr01o/height_of_female_popstars/

Modal religions by county in the U.S.

I love my more elaborate examples, but this is a short, sweet, and interesting way to refresh your measures of central tendency lecture when you explain mode. I present you with the modal religions in each U.S. county: Found on Reddit:   https://www.reddit.com/r/dataisbeautiful/comments/1hejglm/most_common_religion_in_every_us_county_oc/ Source of Original Data: https://www.thearda.com/?gad_source=1&gclid=CjwKCAiA9vS6BhA9EiwAJpnXw7IpjxFvuiS3UvLycZrZ2ggtEzS2JDR-ow0mksK-9rD06G8Lgq6mlhoC1nwQAvD_BwE

An interactive that gets your students thinking about medians, percentiles, and their own sleeping habits.

My students struggle with sleeping and are distracted by electronics. This interactive activity allows them to think about their sleep relative to norms regarding age and sex. It also dives deeply into how sleep changes over a person's lifespan, which is a topic suitable for non-static classes like Health or Developmental.   https://www.washingtonpost.com/wellness/interactive/2024/sleep-data-survey-americans/ *You need a WaPo subscription or paywall buster to get to this interactive. Like this one! https://www.removepaywall.com/search?url=https://www.washingtonpost.com/wellness/interactive/2024/sleep-data-survey-americans/ Here is a quick interactive that a) lets your students see how well they sleep, in comparison to their demographic and b) think about median data and percentile data.  1. Repursped, gently used data is really everywhere. This interactive uses data from the Census Bureau. Which is a way to measure sleep, but not the only way. 2. Median and percentil...

Uncrustables consumption rates by NFL teams 1) do not vary by league, 2) do not correlate with 2023 wins

Many thanks to Dr. Sara Appleby for sharing this data with me!! I really enjoy silly data, like this  one from Jayson Jenks, writing for  The Athletic,  which shows how many Uncrustables each team eats per  week. Well, data from the teams that elected to participate and/or didn't make their own PB and Js. The whole article is fun, so give it a read. It makes sense that hungry athletes would go for a quick, calorie-dense, nostalgic snack containing protein.  Here is the data visualization:  Damn, Denver.  I entered this data into a spreadsheet for all of us. Spoiler alert: The number of Uncrustables eaten per week does not vary by league (independent t -test example), and the number of wins in 2023 does not correlate with the number of Uncrustables eaten per week in 2023 (correlation/regression example). Also, for my own curiosity, I re-ran the data after deleting Denver, and it wasn't enough of a difference to achieve significance.  

Subways! Murder! Absolute vs. relative risk!

When I teach the basics of probability in Intro Stats, I always emphasize absolute vs. risk. I am delighted to have a brand new example. Thanks to Sy Islam for sending it my way. Here is the headline  from The New York Post: So, one murder is too many murders. A 60% increase feels very scary. Because relative risk is always the scary risk.  Since this reporting is about something that is very serious, the reporting itself should be serious, right? Well, what were the absolute values for subway murders? I mean, The New York Post would never, ever want to instill fear in people, right? Well, despite this headline, The New York Post, much to its credit, did include the absolute data in the actual article: Eight murders, versus five in the previous years. Which is too terrible, but not nearly as frightening as a (checks notes) SOARING 60% increase. Anyway. Ta-da! Use this in your class. 

Turn your data into a GIF with Google

Google will let you make a GIF of your data. I made this GIF using YouGov data. So far, it lets you make four different kinds of GIFs. This is a small tool, but it is an excellent alternative/supplement when you are teaching students how to present their data in a PowerPoint. It isn't very show, which is the point, I think. You wouldn't want to be distracted from your data, but it adds some motion. ALSO: I personally love GIFS. 

Paris Olympics 2024: I'm here for the dank memes

 

Whataburger Index: Operationalizing power outages in hurricane ravaged Texas.

As a stats nerd, I love it when clever people make lives easier by finding clever, easy, indirect ways to estimate the thing they want to measure. As a statistics instructor, I find such examples engaging, as they encourage students to think critically and nurture their statistical literacy.  Like the Waffle Shop index. TL;DR: During weather emergencies, the federal government tracks whether or not Waffle Shops are open as a proxy for the severity of damage in a community. Waffle Shops are tough as hell, and if they close, a community needs help.  Below is a map of Waffle Houses. https://www.scrapehero.com/store/wp-content/uploads/maps/Waffle_House_USA.png Due to Hurricane Beryl, the people of Houston, Texas discovered an even more accurate measure the severity of electricity outages: The Whattaburger Index:   https://www.facebook.com/photo/?fbid=8242206945824619&set=gm.2698315720337038&idorvanity=1416658058502817 Certainly, Waffle House exists in Texas. 126...

Predictions are only as good as the regularity of the event

Weather prediction is data. This makes weather data-related stories and examples highly relatable. The Washington Post published an interactive article t hat shows how accurate weather predictions are for a given city in the United States. This means that we, stats instructors, can use this page to provide a geographically personalized lesson on weather prediction, the limitations of data, and why predictions about the future are only as good as the consistency of the past. I also like this example because it isn't terribly mathy and encourages statistical literacy.  Kommenda and Stevens, writing for the Washington Post, recently shared a story on the accuracy of weather predictions based on time away from the target day. Here, the DV is prediction accuracy, operationalized using the difference between predicted and actual high temperature. You could always ask your students how they would operationalize weather...or maybe some weather matters more than others? Folks in Erie...

Statistical thinking: What data would you need to collect to disprove the predictive power of astrological signs?

Okay. I haven't used this in class yet because it is July, and I just found it. However, I will open the Fall 2024 semester with this example. It is fun and accessible and shows how research can be used to study whether or not personality varies based on astrological signs. I will start by showing them a bunch of funny astrology memes (see above). Then, I'll ask them to think of ways to design a study to prove that astrology is/is not bunk. What sort of data would they need to collect to do this?  Then, I'm going to show them this study ( Joshanloo, 2024 ): https://onlinelibrary.wiley.com/doi/epdf/10.1111/kykl.12395?domain=author&token=BKSRDREAX9F3BKAWGVBD Statsy things to share with your students: 1. Archival data : The used repurposed, vintage, federal data. The General Social Survey, to be specific. Data scientists are trained to see the potential of random data sets.    The horoscope sign was simple to determine since the GSS collects birthday data. The author was...

Not a particularly statsy example, but still delightful.

I mean. This is the most entertaining research methodology I have ever seen. What did this look like? This is what it looked like.  So, this is barely a statsy example, but it does include data outcomes:  n = 175, with some snakes striking the boot ( n = 6) and some coiling ( n = 3). While PIs might try to No IRB would let you get away with asking your graduate student to step on snakes. Mostly, this is funny. I found his research, too . While I think the fake leg is highly amusing, I think it is great that Morris is a passionate advocate for snake education and teaching people to be tolerant of snakes they find in the wild. Finally, I heard about this research on an NPR story about snake handling classes (taught by Morris) in Arizona. A WHOLE CLASS. 

Caffeine, calories, correlation

We need more nonsignificant but readily understood examples in our classes. This correlation/regression example from Information is Beautiful  demonstrates that the calories in delicious caffeinated drinks do not correlate with the calories in the drink. Caffeine has zero calories. The things that make our drinks creamy and sweet may have calories. Easy peasy, readily understood, and this example gives your students a chance to think about and interpret non-significant, itty-bitty effect size findings.  Click here for the data. Aside: Watch your language when using this example. We need calories to stay alive and none of these drinks, in and of themselves, are good or bad. Our students are exposed to way too much of that sort of language and thinking about food and their bodies. What they choose to drink or eat is none of our business. When I share this visual, I omit the information on the far right (exercise) and far left (calorically equivalent foods). It distracts from the...

Law of large numbers, via M&Ms and a GIF.

A quick, accessible example of the Law of Large Numbers. Using candy. Reddit user Jeffrowl counted the proportions of M&Ms across multiple bags, and you can see the proportions of colors reflect the true underlying population as the number of bags increases.  Here is the link , and a screenshot of the GIF can be seen here: I don't use the M&M probability example in class, but  many of you do . This is a nice addition to that example, but it also serves as a brief, standalone example. ALSO, to my nerdy delight, the author's responses include a Methods section: ...as well as information on baseline data: 

How the USAF collects hurricane data with big, big airplanes.

I am an Air Force Brat. Growing up, my dad used to talk about all of the services the USAF provides to our country and the world. It employs many  musicians , advances  airplane safety  for civilians, and conducts and sponsors plenty of research . This post will focus on the USAF's unique position to advance weather and climate science via data collection in big, honkin' airplanes that can fly through hurricanes.  Weather forecasting requires data. As reported by Debbie Elliot for NPR , the Air Force collects data that, specifically, will help us better predict severe weather and save lives.  Aside: This whole mission started on a bet: HOW TO USE IN CLASS: -I tell my students repeatedly that I'm not trying to turn them into the world's best statisticians. I'm trying to help them learn how to be themselves, with their interests and abilities, but fluent in statistical literacy. This lesson goes better when I can have examples of data jobs that aren't traditi...

My other favorite stats newsletter: The Washington Post's How to Read This Chart

 Unlike the Chartr newsletter, I love this as it feeds my fascination with data and provides interesting examples for the class. As I sit here writing (5/11/24), I am enjoying my other favorite stats newsletter, How to Read This Chart . The current newsletter discusses data visualizations used on the front page of the Post. Such as: Philip Bump lovingly curates this newsletter. One time, he found historic, unlabeled charts and asked readers for help interpreting them . I also thought this one, which compared the margin of error and sample sizes used by major national polling firms, fascinating .

If you like this blog, you will love my new podcast...

My friends. I started a podcast.  I've created the Not Awful Data podcast with the help of Garth Neufeld and Eric Landrum at the Psych Sessions podcast empire.  Why? I try to keep my blog posts brief and to the point, but I also love to discuss exactly how I use my favorite data sets in the classroom. This podcast will let me discuss and highlight some of the data sets I've shared on my blog and provide more information on exactly how you could use them in class. Anyway. Every podcast if five minutes long. I plan on posting a new episode once a week. My hope is that this will re-introduce you to some of my older resources and provide you with some out-of-the-box resources you can use in your own teaching. Here is a link to my first episode , which recaps the horror movie/heartbeat data I shared on the blog recently. The podcast is also available on Spotify .

One of my favorite stats mailing lists: Chartr

Chartr|Data Storytelling   Just subscribe. It is entertaining. I mean, look at this: Like, there is a part of my brain that can just doom scroll stats content. Stats scroll? That sounds like an R function. Anyway, that part of my brain loves Chartr

Citizen Scientists, Unite! The Merlin App, Machine Learning, and Bird Calls

Every Spring and Summer, I become obsessed with the Merlin App. This app allows you to record bird songs using your phone and then uses machine learning to identify the bird call. The app can also do visual IDs if your phone has a much better camera than mine.  It is like PokemonGo. I have to catch them all. But no augmented reality, just reality reality.  Here is my "life list" of all the birds I've identified in about a year of using the App: This app brings joy. It is also a quick example of how citizens can become scientists, how Apps can generate data from citizen scientists, and how machine learning makes it work. So, this isn't a lengthy example for class, but it is an accessible example that shows how apps and phones can be harnessed for the better good. And science is super fun. How this App gathers data from users: But how? Via machine learning: Here is even more info on how their machine learning works: AND THEN, the data can be used for scientific research...

Deer related insurance claims from State Farm

We should teach with data sets representing ALL of our students. Why? You never know what example will stick in a student's head. One way to get information to stick in is by employing the self-reference effect .  For example, students who grew up in the country might relate to examples that evoke rural life. Like getting the first day of buck season off from school and learning how to watch out for deer on the tree line when you are going 55 MPH on a rural highway. Enter State Farm's data on the likelihood, per state, of a car accident claim due to collision with an animal (not specifically deer, but implicitly deer) . Indeed, my home state of Pennsylvania is the #3 most likely place to hit a deer with your car. State Farm shares its data per state: https://www.statefarm.com/simple-insights/auto-and-vehicles/how-likely-are-you-to-have-an-animal-collision I am also happy to share my version of the data , in which I turned all probability fractions (1 out of 522) into probabili...

Using pulse rates to determine the scariest of scary movies

  The Science of Scare project, conducted by MoneySuperMarket.com, recorded heart rates in participants watching fifty horror movies to determine the scariest of scary movies. Below is a screenshot of the original variables and data for 12 of the 50 movies provided by MoneySuperMarket.com: https://www.moneysupermarket.com/broadband/features/science-of-scare/ https://www.moneysupermarket.com/broadband/features/science-of-scare/ Here is my version of the data in Excel format . It includes the original data plus four additional columns (so you can run more analyses on the data): -Year of Release -Rotten Tomato rating -Does this movie have a sequel (yes or no)? -Is this movie a sequel (yes or no)? Here are some ways you could use this in class: 1. Correlation : Rotten Tomato rating does not correlate with the overall scare score ( r = 0.13, p = 0.36).   2. Within-subject research design : Baseline, average, and maximum heart rates are reported for each film.   3. ...