Skip to main content

Posts

r/DataIsUgly

I have found plenty of class inspiration on Reddit. Various subs have provided a  new way to explain mode   and median  and great, intuitive data to teach  correlation . However, much as a reverse-coded item on a scale can be used to get to the opposite of what you are asking about, r/DataIsUgly is rife with examples of how NOT to do data as to teach how to create good data visualizations. Very recently, I shared this example from r/DataIsUgly to illustrate why NOT to truncate the Y axis .  And...this sub is filled with people like us. People who love to proofread and notice data crimes. For example: How to use it in class? Can your students figure out why these data visualizations are...less than optimal? Can they fix them? They could be a fun prompt for extra credit points or a discussion board.

Annual snow fall moderates the relationship between daily snow fall and the likelihood of canceling school

Moderation isn't one of those things that we typically teach in Intro Stats. But it is a statistical tool your advanced undergraduates will likely encounter in an upper-level course. I'm not going to teach you how to teach your students how to do one. I am, however, going to share a  example of what mediation is doing, inspired by living in the city in the US that has received the most snow this season (Erie, PA, with 93.9 inches for the season as of 1.30.25).  About a year ago, CNN shared data on how much snow it takes to cancel school in various parts of the country. I assure you, Erie and the rest of Northwest PA (see red outline) gets hella snow but no snow days. https://www.cnn.com/2024/02/12/us/how-much-snow-kids-school-snow-day-across-us-dg/index.html However, our lack of snow days isn't due to lack of snow. The annual amount of snow moderates the likelihood to cancel school, such that if you are used to a lot of snow (and have the infrastructure to handle it) you d...

Absolute vs. relative risk reporting: Lake effect snow edition

I maintain that relative versus absolute risk is a concept that we absolutely must teach in intro stats. I have given some examples of this before ( murder ! COVID !) but here is another one that hits home for my fellow Great Lakers. In particular, this one is for my friends in Detroit, Toledo, Cleveland, and Buffalo  Up here in Erie, a common point of discussion is how frozen Lake Erie is. Because once it freezes, the lake's moisture no longer feeds Dread Lake Effect Snow.   I like this example because you can easily perform the math in front of your class, demonstrating that 26.14% is 103.45% of 12.85%. At the same time, you have the visual to demonstrate that the vast majority of the lake was still unfrozen even with a 103.45% increase. 

Full Discussion Board Idea #1: Repurposing gently-used, second-hand data during times of crisis

I can't be the only one teaching online statistics this Spring. Last fall, I refreshed ALL of my discussion boards for my online version of Psychological Statistics. I haven't done so since 2020, and my students responded well to my new discussion topics, all of which are centered in statistical literacy and improving problem-solving with data. My first one is based on this old blog post about how residents of Houston used a Whataburger location map to figure out which parts of Houston were without electricity following Hurricane Meryl. Here is how I presented it to my students: You never know where valuable data visualizations will come from! For instance, following Hurricane Beryl, Texans used the Whataburger app to track power outages across Houston. Whataburger is a popular restaurant chain in the South. Its app has a feature where users can quickly see open or closed locations. Normally, this is used by hungry people to find the closest, open location. HOWEVER: There are S...

Data can be equity: Merging of Major League Baseball and Negro League Baseball data.

I know it is January 2025, but I want to write about something that happened during the Spring of 2024. I think it is a story about how it is never too late to do the right thing, making it great thing to think about here at the New Year. Data can't undo the past, but the way we manage it moving forward can provide the opportunity for some measure of equity. Back in May, professional baseball decided to include Negro League (NL), which existed from 2910 to 1948, baseball stats as part of Major League Baseball (MLB) stats. This is was done to allow for proper recognition of talented ML players. This changed some storied records for the league: https://www.mlb.com/news/stats-leaderboard-changes-negro-leagues-mlb This was a lot more than merging a couple of spreadsheets. As such, this story also serves as a lesson in data management and making desperate datasets the same. One that is a lot more moving than your typical story of data-cleaning. The following screenshots are from:  ...

Truncated Y-axis, but with female celebrities.

Why did I find this after my textbook was published? Damn it. I have a whole section about how Y-axis manipulation can make small differences look huge and then...I find this. Damn it. Source:   https://www.reddit.com/r/dataisugly/comments/1hjr01o/height_of_female_popstars/

Modal religions by county in the U.S.

I love my more elaborate examples, but this is a short, sweet, and interesting way to refresh your measures of central tendency lecture when you explain mode. I present you with the modal religions in each U.S. county: Found on Reddit:   https://www.reddit.com/r/dataisbeautiful/comments/1hejglm/most_common_religion_in_every_us_county_oc/ Source of Original Data: https://www.thearda.com/?gad_source=1&gclid=CjwKCAiA9vS6BhA9EiwAJpnXw7IpjxFvuiS3UvLycZrZ2ggtEzS2JDR-ow0mksK-9rD06G8Lgq6mlhoC1nwQAvD_BwE