Skip to main content

Posts

Showing posts with the label data set

Leo DiCaprio Romantic Age Gap Data: UPDATE

Does anyone else teach correlation and regression together at the end of the semester? Here is a treat for you: Updated data on Leonardo DiCaprio, his age, and his romantic partner's age when they started dating. A few years ago, there was a dust-up when a clever Redditor r/TrustLittleBrother realized that DiCaprio had never dated anyone over 25. I blogged about this when it happened. But the old data was from 2022. Inspired by this sleuthing,  I created a wee data set, including up-to-date information on his current relationship with Vittoria Ceretti, so your students can suss out the patterns that exist in this data.

Caffeine, calories, correlation

We need more nonsignificant but readily understood examples in our classes. This correlation/regression example from Information is Beautiful  demonstrates that the calories in delicious caffeinated drinks do not correlate with the calories in the drink. Caffeine has zero calories. The things that make our drinks creamy and sweet may have calories. Easy peasy, readily understood, and this example gives your students a chance to think about and interpret non-significant, itty-bitty effect size findings.  Click here for the data. Aside: Watch your language when using this example. We need calories to stay alive and none of these drinks, in and of themselves, are good or bad. Our students are exposed to way too much of that sort of language and thinking about food and their bodies. What they choose to drink or eat is none of our business. When I share this visual, I omit the information on the far right (exercise) and far left (calorically equivalent foods). It distracts from the...

If you like this blog, you will love my new podcast...

My friends. I started a podcast.  I've created the Not Awful Data podcast with the help of Garth Neufeld and Eric Landrum at the Psych Sessions podcast empire.  Why? I try to keep my blog posts brief and to the point, but I also love to discuss exactly how I use my favorite data sets in the classroom. This podcast will let me discuss and highlight some of the data sets I've shared on my blog and provide more information on exactly how you could use them in class. Anyway. Every podcast if five minutes long. I plan on posting a new episode once a week. My hope is that this will re-introduce you to some of my older resources and provide you with some out-of-the-box resources you can use in your own teaching. Here is a link to my first episode , which recaps the horror movie/heartbeat data I shared on the blog recently. The podcast is also available on Spotify .

Deer related insurance claims from State Farm

We should teach with data sets representing ALL of our students. Why? You never know what example will stick in a student's head. One way to get information to stick in is by employing the self-reference effect .  For example, students who grew up in the country might relate to examples that evoke rural life. Like getting the first day of buck season off from school and learning how to watch out for deer on the tree line when you are going 55 MPH on a rural highway. Enter State Farm's data on the likelihood, per state, of a car accident claim due to collision with an animal (not specifically deer, but implicitly deer) . Indeed, my home state of Pennsylvania is the #3 most likely place to hit a deer with your car. State Farm shares its data per state: https://www.statefarm.com/simple-insights/auto-and-vehicles/how-likely-are-you-to-have-an-animal-collision I am also happy to share my version of the data , in which I turned all probability fractions (1 out of 522) into probabili...

Using pulse rates to determine the scariest of scary movies

  The Science of Scare project, conducted by MoneySuperMarket.com, recorded heart rates in participants watching fifty horror movies to determine the scariest of scary movies. Below is a screenshot of the original variables and data for 12 of the 50 movies provided by MoneySuperMarket.com: https://www.moneysupermarket.com/broadband/features/science-of-scare/ https://www.moneysupermarket.com/broadband/features/science-of-scare/ Here is my version of the data in Excel format . It includes the original data plus four additional columns (so you can run more analyses on the data): -Year of Release -Rotten Tomato rating -Does this movie have a sequel (yes or no)? -Is this movie a sequel (yes or no)? Here are some ways you could use this in class: 1. Correlation : Rotten Tomato rating does not correlate with the overall scare score ( r = 0.13, p = 0.36).   2. Within-subject research design : Baseline, average, and maximum heart rates are reported for each film.   3. ...

Generate highly personalized music data using Exportify

Spotify generates gobs of data about music.  Most people have seen the end-of-the-year data Spotify generates for each user about their listening patterns . Most people don't know that Spotify also generates a lot of data about individual songs. Some of it is straightforward: tempo, genre, length. However, Spotify also has its own niche way of quantifying songs: Danceability. Accousticness. Here is a whole list of their variables and descriptions from researchers at CMU:  https://www.stat.cmu.edu/capstoneresearch/315files_s23/team23.html What does this mean for a stats teacher? You have access to highly personalizable data sets, rooted in music, with gobs and gobs of variables for each song...or artist...or album...or year of release...or genre (like, so many ways to divide up your data).  For instance,  I created a data set with Spotify data for 1989 and 1989 (Taylor's Version) to teach paired  t -tests . How do Taylor's re-recordings compare to the originals?...

Paired T-tests (Taylor's Version)

Ok, more Taylor Swift data for you. DID YOU KNOW that Spotify collects buckets and buckets of data about each and every song it provides (see:  https://www.spotify-song-stats.com/about ) So, I downloaded this information for 1989 and 1989 (Taylor's Version). So I could test for any differences between the recordings. Like, with data, not with my feelings and emotions. Specifically with a paired t -test. I get it. The sample sizes are very small. However, the data is still interesting. It makes sense that the tempo hasn't changed. Like, she did slow down or speed up anything. And that is super NS with an itty-bitty effect size. It is also interesting that acousticness has decreased. These are more heavily produced versions of the same songs (IMO), and while this change didn't achieve significance, it is a moderate effect size.  ANYWAY, you aren't really here for this information. You are here for data to share with your classes, yes? I'm here to help you teach your s...

The Taylor Swift Effect: Does Tay-tay's presence influence Travis Kelce's performance?

In what is a common occurance for this blog, it all started with a Tweet. A very punny Tweet https://twitter.com/ESPNFantasy/status/1716216331752624509 It begs the question: How are various indicators of Kelce's performance influenced by the presence or absence of one Taylor Swift? What she is steadily attending games this fall, we'll have to wait and see if her international tour, starting 11/7, changes that. Regardless, I'll update THIS SPREADSHEET over the season so you can run all of the independent t-tests you want with your students.  AND SOMEDAY I WILL UPDATE THIS SPREADSHEET TO INCLUDE WHETHER OR NOT THEIR CHILDREN ATTEND I SWEAR IT IS COMING.

SMARVUS database of stats students and many of their feelings and cognitions about stats

You all. Many people, but mostly Jenny Terry and Andy Fields, but also a number of my Twitter mutuals,  collected a crap ton of data from statistics students worldwide .  See: Here is the article describing the project . The data is embargoed until October 2024, but you can contact the corresponding authors if you would like early access. Also, they have tons and tons of documentation available at OSF . So you can come up with your own hypotheses and test them. Which is very, very generous.

America's worse drivers, according to Consumer Affairs.

Consumer Affairs released a list of America's best and worst drivers . It is a short article but contains many good stats nuggets. 1. Ratio and ordinal versions of the same data. 2. Where did the ratio data come from? Take a look at the Methodology. 3.  Here is the data for the twenty most terrible driver s. It includes the nominal/ratio data I shared above and the top four bullet points from the image above. 4. Where did they find their data? Lucky for us, they cite their data. Which is good form, right? But also, it is an example of how much hecking data is out there. 

CDC Mental Health Data

It shouldn't come as a shock that the CDC shares data on rates of public health issues in the US.  However, you may be unaware of the available data and interactive visualizations provided by the CDC and the different ways you can use them in class . 1. Teach your students a lesson about good sources for mental health data. 2. Show your students how data visualizations can help present and simplify complex data. https://www.cdc.gov/nchs/covid19/pulse/mental-health.htm 3. Get into the research methods. Everyone has heard of the census, but fewer have heard of the Household Pulse Survey (https://www.census.gov/data/experimental-data-products/household-pulse-survey.html). The US Census collects much information between the 10-year census, including mental health data. https://www.census.gov/data/experimental-data-products/household-pulse-survey.html 4. Talk about how the government assesses depression and anxiety. For example, you can show how the basic methodology uses a valid, relia...

MCU regression, revisited

I think it is important to emphasize how regression can be used to make future predictions using trends in existing data. Most psychology books use psychology examples to illustrate this, which makes sense. Still, I think explaining how regression is widely used in business to make financial decisions, and predictions is important. But that can be boring. But I found one example that uses the Marvel Comic Universe to do this. I already blogged about this , but I'm sharing exactly how I used this in class presently. ASIDE: This data is being regularly updated! Here is a Google Drive folder with 1) my version of the data (CSV and I turned all the percentages to decimal points for JASP) and 2) my PPT . Which includes photos of the scientists of the MCU. ALSO: While your students are doing their exercise, totes play the soundtrack from Guardians of the Galaxy. Do it. 

1,200 years worth of cherry blossom bloom data from Kyoto, Japan.

It is April 18 in Erie, PA. It sleeted yesterday at my kid's soccer game. However, I know in my heart that Spring is coming. Every year, I get excited about the first crocuses and daffodils here in NW PA. Due to these hard winters followed by beautiful (if snowy) springs, I feel a certain kinship for the Japanese spring lovers who have been tracking the date of the cherry blossom blooms in Kyoto, Japan, for the last 1,200 years. Well, it hasn't always been tracked by humans; sometimes, modern humans have extrapolated this data. I'll get to that in a second. I learned about this data from Twitter user Robin Rohwer . She created this visualization for the data: https://twitter.com/RobinRohwer/status/1639097356657512449 She also shared where she found this data via NOAA , via  Yasuyuki Aono's website: http://atmenv.envi.osakafu-u.ac.jp/aono/kyophenotemp4/ . Go to the NOAA website and poke around. You can see notations referring to how the data was extrapolated over time an...

Can we use Instagram to estimate happiness at universities?

OK. Lotte van Rijswijk, writing for Resume.io,  used Instagram photos to determine the happiest college in the United States, United Kingdom, and Australia . Here is the Top 20 list for the US. If you go to the website, you can see similar summaries for the UK and Australian data and an interactive table containing all of the data. Here are some ideas for using it in class: 1. This methodology is pretty interesting. She used smiling recognition software and pictures from Instagram to measure happiness. I think this study would pair well with this study about using software to evaluate smiles: https://journals.sagepub.com/doi/10.1177/0956797617734315 https://resume.io/blog/the-happiest-schools-in-the-us-uk-and-australia 2. Ask your student to consider the sampling error that may result from using Instagram data for any research. For example, are photos on Insta representative of human experiences? Is it reasonable to gather a sample of college-aged students using Insta? 3. The ...

Multiverse = multiple correlation and regression examples!

I love InformationIsBeautiful . They created my favorite data visualization of all tim e.  They also created an interactive scatterplot with all sorts of information about Marvel Comic Universe  films. How to use in class: 1. Experiment with the outcome variables you can add to the X and Y axes: Critical response, budget, box office receipts, year of release, etc. There are more than that; you can add them to either the X or Y axes. So, it is one website, but there are many ways to assess the various films. 2. Because of interactive axes, there are various correlation and regression examples. And these visualizations aren't just available as a quick visual example of linear relationships...see item 3... 3. You can ask your students to conduct the actual data analyses you can visualize because  the hecking data is available . 4. The website offers exciting analyses, encouraging your students to think critically about what the data tells them. 5. You could also squeeze Simp...

Our World in Data's deep dive into human height. Examples abound.

Stats nerds: I'm warning your right now. This website is a rabbit hole for us, what with the interactive, customizable data visualizations. Please don't click on the links below if you need to grade or be with your kids or drive.  At a recent conference presentation, I was asked where non-Americans can find examples like the ones I share on my blog. I had a few ideas (data analytic firms located in other countries, data collected by the government), but wanted more from my answer.  BUT...I recently discovered this interactive from Our World in Data. It visualizes international data on human height, y'all  with so many different examples throughout. I know height data isn't the sexiest data, but your students can follow these examples, they can be used in a variety of different lessons, and you can download all of the data from the beautiful interactive charts. 1. Regressions can't predict forever. Trends plateau.  I'm using this graph to as an example of how a r...

YEET!, or why you should always check your scatter plot

 I sneak attack my students with this correlation example. I ask them to analyze this data as a correlation and create a report describing their data. This is what the data looks like: I'll be honest, I mostly do this for my own amusement. HOWEVER: It does demonstrate that scatter plots are helpful when making sure that a correlation analysis/scatter plot may contain a non-linear relationship (see: Datasaurus ). If you want to make your own silly scatter plot for data analysis, I recommend Robert Grant's DrawMyData website for doing so.

Organizations sharing data in a way that is very accessible

A few weeks ago, I posted about how you can share data in such a terrible way that one is not breaking the law, but the data is completely unusable. This makes me think of all the times I am irked when someone states a problem but doesn't offer a solution to the problem. Instead, they just talk about what is wrong and not how it could be. So, as a counter piece, let's cheer on organizations that ARE sharing data in a way that is readily accessible. You could use this in class as a palate cleanser if you teach your students about data obfuscation. You could also use it as a way of helping your students understand how data really is everywhere. Or even challenge them to brainstorm an app that uses readily accessible data in a new way to help folks.  Pro-Publica This website lets you check how often salmonella is found at different chicken processing plants. All you need to do is enter the p-number, company, or location listed on your package of chicken: https://projects.propubli...

Google Dataset search engine

HEY. Here is a whole bunch of data, searchable via Google.                       https://datasetsearch.research.google.com/ h/t: Samy ! 

Women's pockets are crap: An empirical investigation

The Pudding  took a data-driven approach to test a popular hypothesis: Women's pockets are smaller than men's pockets.  Authors Diehem and Thomas sent research assistants to measure the pockets on men's and women's jeans. They even shared supplemental materials, like the exact form the RAs completed. https://pudding.cool/2018/08/pockets/assets/images/MeasurementGuide.pdf And they used fancy coding to figure out the exact dimensions of the jeans. Indeed, even when women are allowed pockets (I'm looking at you, dressmakers!), the pockets are still smaller than they are in men's jeans. They came to the following conclusion: Amen. Anyway, there are a few ways you can use this in the classroom: 1) Look at how they had a hypothesis, and they tested that hypothesis. Reasonably, they used multiple versions of the same kind of pants. If you check out their data, you can see all of the data points they collected about each type of jeans. They even provide supplemental mat...