Skip to main content

Using GenAI to generate teachable data sets (here, an independent t test)

Two things I love to use when teaching stats are:

1) Journal of the American Medical Association (JAMA) visual abstracts. I've blogged about them before.

2) Useful tools to generate pretend data sets that mimic real data, and use those pretend data sets to teach. See: Richard Landers' and Andrew Luttrell's websites.

So, I was delighted when I saw this recently posted visual abstract about Ewing-Cobbs et al. (2026) research on using a specific CBT program to reduce stress in children following a traumatic physical injury


https://jamanetwork.com/journals/jamapediatrics/fullarticle/2848163


I have a new example of an independent test for class. Yay! And I teach tons of future nurses/PAs, so it is doubly applicable.

However, the authors stated that the data wasn't immediately available. Also, once it is available, they (very reasonably) want to track their data sharing. Meaning that even if I could get their data, I shouldn't be sharing it on this blog.

I decided to create a dataset that mimics the findings. While I have other tools to do so (see above), I decided to try using GenAI this time. I figured that if I had more descriptive data, I could make a better fake dataset. I found the CIs (in yellow) for the data points in the tables:


Armed with means, sample sizes, and CIs, I wrote this prompt:

"Hi! I would like you to create a data set for me. It should contain two variables, "ReSeT" and "Usual Care". The mean for ReSet should be 10.5, with a 95% CI of 8.1-12.9, and n = 44. The mean for Usual Care should be 14.7 with a 95% CI between 12.2 and 17.2, and n = 42. For the data you generate, every data point should be a whole number with no decimals."

After some fanagling (Copilot couldn't perfectly generate the data in the parameters I stated, but we got really close), I generated the data sets (Here is the .txt version, and here is the same data, in  JASP format for my fellow JASP users/instructors)

How I will use it in class: Often, stats instructors structure their class so that students solve a mystery by analyzing data. That's great. But it is also nice to work backward: Show your students the visual abstract. Have them identify the IV, the DV, and the findings. THEN have them analyze data that, while not the real data, does mimic the main findings. It is sort of like giving them a road map. They know exactly where they are going, but they still need to analyze the data themselves. Needless to say, when I use data sets that I create, I ALWAYS tell my students that they aren't working with the actual data, but with data that mimics the real findings. 





Comments

Popular posts from this blog

Ways to use funny meme scales in your stats classes

Have you ever heard of the theory that there are multiple people worldwide thinking about the same novel thing at the same time? It is the multiple discovery hypothesis of invention . Like, multiple great minds around the world were working on calculus at the same time. Well, I think a bunch of super-duper psychology professors were all thinking about scale memes and pedagogy at the same time. Clearly, this is just as impressive as calculus. Who were some of these great minds? 1) Dr.  Molly Metz maintains a curated list of hilarious "How you doing?" scales.  2) Dr. Esther Lindenström posted about using these scales as student check-ins. 3) I was working on a blog post about using such scales to teach the basics of variables.  So, I decided to create a post about three ways to use these scales in your stats classes:  1) Teaching the basics of variables. 2) Nominal vs. ordinal scales.  3) Daily check-in with your students.  1. Teach your students the basics...

Leo DiCaprio Romantic Age Gap Data: UPDATE

Does anyone else teach correlation and regression together at the end of the semester? Here is a treat for you: Updated data on Leonardo DiCaprio, his age, and his romantic partner's age when they started dating. A few years ago, there was a dust-up when a clever Redditor r/TrustLittleBrother realized that DiCaprio had never dated anyone over 25. I blogged about this when it happened. But the old data was from 2022. Inspired by this sleuthing,  I created a wee data set, including up-to-date information on his current relationship with Vittoria Ceretti, so your students can suss out the patterns that exist in this data.

If your students get the joke, they get statistics.

Gleaned from multiple sources (FB, Pinterest, Twitter, none of these belong to me, etc.). Remember, if your students can explain why a stats funny is funny, they are demonstrating statistical knowledge. I like to ask students to explain the humor in such examples for extra credit points (see below for an example from my FA14 final exam). Using xkcd.com for bonus points/assessing if students understand that correlation =/= causation What are the numerical thresholds for probability?  How does this refer to alpha? What type of error is being described, Type I or Type II? What measure of central tendency is being described? Dilbert: http://search.dilbert.com/comic/Kill%20Anyone Sampling, CLT http://foulmouthedbaker.com/2013/10/03/graphs-belong-on-cakes/ Because control vs. sample, standard deviations, normal curves. Also,"skewed" pun. If you go to the original website , the story behind this cakes has to do w...