Skip to main content

Posts

Showing posts from 2022

Our World in Data's deep dive into human height. Examples abound.

Stats nerds: I'm warning your right now. This website is a rabbit hole for us, what with the interactive, customizable data visualizations. Please don't click on the links below if you need to grade or be with your kids or drive.  At a recent conference presentation, I was asked where non-Americans can find examples like the ones I share on my blog. I had a few ideas (data analytic firms located in other countries, data collected by the government), but wanted more from my answer.  BUT...I recently discovered this interactive from Our World in Data. It visualizes international data on human height, y'all  with so many different examples throughout. I know height data isn't the sexiest data, but your students can follow these examples, they can be used in a variety of different lessons, and you can download all of the data from the beautiful interactive charts. 1. Regressions can't predict forever. Trends plateau.  I'm using this graph to as an example of how a r...

Caffeine and Calories: An example of a non-linear relationship

Not all of our class examples should reject the null. Sometimes, you just need some non-significant data, small effect size data that doesn't detect a linear relationship. Such is the linear relationship between the number of calories and mg of caffeine in these 29 different treats provided by InformationIsBeautiful. InformationIsBeautiful provides that data , as do I .

The Pudding's Words Against Strangers: A way to break up your z-score lecture.

Ok. Only some examples have to be profound. Sometimes, an example can break up a dry lesson like  z -scores.  This is my favorite z -score example . Ever. This current post may become my second favorite. The Pudding's Words Against Strangers is a game with four minute-long rounds. Each round asks for a type of word. Adjectives containing the letter "m." Verbs that contain an "r" and are precisely five letters long. That sort of prompt. Then you have one minute to type in as many of these words as possible. I recommend playing this on a computer, not a phone. If you are over 40. You are competing against one person on the internet.  After you play, your record is displayed as either: a) your over/under against the opponent b) your percentile score for everyone on the internet. Here is how I will use it in class. My students get into other games I've worked on in my classes ( Guess the Correlation ). I plan on asking my students to play this game, view their...

YEET!, or why you should always check your scatter plot

 I sneak attack my students with this correlation example. I ask them to analyze this data as a correlation and create a report describing their data. This is what the data looks like: I'll be honest, I mostly do this for my own amusement. HOWEVER: It does demonstrate that scatter plots are helpful when making sure that a correlation analysis/scatter plot may contain a non-linear relationship (see: Datasaurus ). If you want to make your own silly scatter plot for data analysis, I recommend Robert Grant's DrawMyData website for doing so.

chartr's "Speed or Accuracy? It's hard to do both in fast food drive-thrus"

Sometimes, you just need a new, simple example for a homework question or a class warm-up.   I eyeballed and entered the   data here  ( r   = -.55). Enjoy. I use this little example to explain to use the regression formula to make a prediction. Here are my slides .

Between and within group variance, explained with religion, politics, and climate change.

Ages ago, I shared how I teach ANOVA at a conceptual level. I describe within and between group variance using beliefs about the human role in climate between and within different religious groups. This data is now old. And it described global warming, not climate change, which is a crucial language distinction. So you  can imagine my delight when Pew recently released  updated and improved data investigating this issue.  In my attempt to keep the mood light when discussing an example featuring 1) religion, 2) climate change, and 3) politics, I ask students to think about how many different opinions are probably represented around their family's Thanksgiving table. Despite having much in common as a family, like, perhaps, geography, shared stories, and religion, there are still a lot of within-group differences of opinion. This leads to a discussion about people of different religions having between and within group differences of opinion regarding beliefs about global cl...

YouGov America's Thanksgiving-themed chi-square examples

YouGov gifts us with seasonal chi-square examples  with data on Thanksgiving food controversies. For example: How do people feel about marshmallows on sweet potato dishes? This doesn't look randomly distributed to me. Which is more beloved: Light or dark turkey meat? If you want examples for the chi-square test of independence, dig into the PDF containing ALL of this survey's data. The distribution of people who like cranberry sauce by age group does not appear random.

Organizations sharing data in a way that is very accessible

A few weeks ago, I posted about how you can share data in such a terrible way that one is not breaking the law, but the data is completely unusable. This makes me think of all the times I am irked when someone states a problem but doesn't offer a solution to the problem. Instead, they just talk about what is wrong and not how it could be. So, as a counter piece, let's cheer on organizations that ARE sharing data in a way that is readily accessible. You could use this in class as a palate cleanser if you teach your students about data obfuscation. You could also use it as a way of helping your students understand how data really is everywhere. Or even challenge them to brainstorm an app that uses readily accessible data in a new way to help folks.  Pro-Publica This website lets you check how often salmonella is found at different chicken processing plants. All you need to do is enter the p-number, company, or location listed on your package of chicken: https://projects.propubli...

History of Data Science's Regression Game

 There are already some pretty cool games for guessing linear relationships/regression lines. Dr. Hill's Eyeball Regression Game . The old, reliable Guess the Correlation game. However, I found a new one that has a particularly gorgeous interface, and a few extra features to help your learners. History of Data Science created the Regression game . It provides the player with a scatter plot, then the player needs to guess the y-intercept and slope. See that regression line? It is generated and changes as the entered a and b values change, which is a good learning tool. If played at the "easy-peasy" level, the player can even change those numbers multiple times over the course of 30 seconds, and watch as the corresponding line changes.  I think this game is a nice way to break up the ol' regression lecture and allows students to see the relationship between the scatter plot and the regression line.

Stats nerd gift list

This isn't a post full of teaching resources. Instead, it is a post of gifts and treats for stats nerds. Who might also teach stats, this still falls under the purview of this blog. Bonus points because many of these suggestions put money into creators' pockets. Statsy Etsy Shops NausicaaDistribution Etsy shop NausicaaDistribution is a great shop on Etsy . I own multiple products, including the ABC's of Statistics Poster shown below. It is beautiful and framed in my office.  The Chemist Tree Etsy shop Another Etsy maker I like is  TheChemistTree.  I have a set of the coasters, and they've held up well.  https://www.etsy.com/shop/TheChemistTree?ref=simple-shop-header-name&listing_id=501955501&search_query=statistics Chelsea Parlett Design Etsy Shop Stats expert Chelsea shares her stats knowledge on Twitter and on Etsy , via her stickers.  https://www.etsy.com/shop/ChelseaParlettDesign DataSwagCo is a newer shop with some funny, punny stats goods. https:/...

Dirty Data: Share the data in a way that is functionally inaccessible

In my intro stats class, we discuss shady data practices that aren't lying because they report actual numbers. But they are still shady because good data is presented in such a way as to be misleading or confusing. These topics include: Truncating the y-axis   Collecting measures of central tendency under ideal circumstances Manipulate online ratings (I didn't write the blog post about this yet, but it is coming). Relative vs. Absolute Risk AND HERE IS ANOTHER ONE: Insurance companies were asked to provide price data  RE: the Transparency in Coverage Rule in the Consolidatedated Appropriations Act of 2021. Google that if you want to know more about that, I'm not going into that. Not my lane. That said, it is an appealing idea. Let's have some transparency in our jacked-up healthcare system. And the insurance companies provided the data, but in a way inaccessible to most people. Like, all people, maybe? Because they just splurted out 100 TB of data. So, they totally com...

Assessing an intervention: A quick exercise for your classes, specialized to your own university.

 Here is a quick RM review I created for my Psych Stats students. We were preparing for the first exam, which covered the very basics of research methodology, including IVs and DVs. We also talk about data visualizations and how they can be used to quickly convey information.  California is dealing with an energy crisis and a heatwave. California tried a relatively inexpensive intervention to reduce the likelihood of overwhelming the energy grid: Sending out text messages during extremely high energy usage. See:   https://www.bloomberg.com/news/articles/2022-09-07/a-text-alert-may-have-saved-california-from-power-blackouts And what happened? People reduced their electric usage. Source: https://www.bloomberg.com/news/articles/2022-09-07/a-text-alert-may-have-saved-california-from-power-blackouts For the class review, I asked my students to think of the emergency alerts they receive from their university via our campus safety app. I challenged them to think of a c...

Missing data leads to conspiracy theory

This is a funny, small example for anyone who discusses managing missing data in a database. This example also touches on what can go wrong when using someone else's data or you merge datasets. So, this piece of information made the rounds in August: This isn't a lie. The voter rolls in Racine had over 20,000 voters with the same phone number. Which led to measured responses from voting rights experts on Twitter. Redhibiscus was so close to the truth! I assure you, if you have ever dealt with complicated databases, especially those that have been merged and go back decades, it isn't unusual to fill in missing data with a specific number repeatedly. Here is a fact check from the A.P. : https://apnews.com/article/fact-checking-612360682016?utm_source=Twitter&utm_campaign=SocialFlow&utm_medium=APFactCheck This isn't a big lesson for a statistics class, but it is a funny and horrifying example of how database management practices fueled a conspiracy theory. It is al...