Skip to main content

Modern musician vocabularies: See how I extracted this data using GenAI, and how you can use it in class.

I intended for this to be a post about the singer vocabulary.

It is still that, but it is also a post about using GenAI to grab data from an image. I mean, you can use Excel to do the same thing, but GenAI is a lot easier.

Here we go. It starts with the Word Tips website, which helps you solve your crossword puzzles and Wordle. This website also has a blog dedicated to words. One such blog post explored which singers have the largest vocabularies, as measured by the number of unique words in their lyrics.

Their blog post compared music legends to newer talent. There are a ton of fun data visualizations on the website; go check it out.

Since I teach college students, I decided to concentrate on the musicians my students listen to:



In and of itself, this image serves as an example of bar graphs, good data visualization, and proper use of "buckets".

However, I figured we could find a way to use the raw data in class. Create your own data visualization, create your own buckets...you could even insert your own data (using ChatGPT, see below) to add variables like number of albums, years in industry, gender, genre, etc. 

But I certainly wasn't going to do that manually. Instead, I used ChatGPT. Click here to see the prompts I used and get a copy of the data in CSV format. There are two spreadsheets available: One has the entire artist's name in one column. Which...is just bad data practice, right? So I also created a second spreadsheet that has two columns for their first and last names. Artists with one name are populated in the First Name column. This leads to a little funkiness for some artists who have two-word names, but the last word isn't a last name: Charlie XCX, Jessie J, etc., but I didn't feel like creating prompts to deal with these situations (but feel free to do so and share with me!). 

Popular posts from this blog

Ways to use funny meme scales in your stats classes

Have you ever heard of the theory that there are multiple people worldwide thinking about the same novel thing at the same time? It is the multiple discovery hypothesis of invention . Like, multiple great minds around the world were working on calculus at the same time. Well, I think a bunch of super-duper psychology professors were all thinking about scale memes and pedagogy at the same time. Clearly, this is just as impressive as calculus. Who were some of these great minds? 1) Dr.  Molly Metz maintains a curated list of hilarious "How you doing?" scales.  2) Dr. Esther Lindenström posted about using these scales as student check-ins. 3) I was working on a blog post about using such scales to teach the basics of variables.  So, I decided to create a post about three ways to use these scales in your stats classes:  1) Teaching the basics of variables. 2) Nominal vs. ordinal scales.  3) Daily check-in with your students.  1. Teach your students the basics...

Leo DiCaprio Romantic Age Gap Data: UPDATE

Does anyone else teach correlation and regression together at the end of the semester? Here is a treat for you: Updated data on Leonardo DiCaprio, his age, and his romantic partner's age when they started dating. A few years ago, there was a dust-up when a clever Redditor r/TrustLittleBrother realized that DiCaprio had never dated anyone over 25. I blogged about this when it happened. But the old data was from 2022. Inspired by this sleuthing,  I created a wee data set, including up-to-date information on his current relationship with Vittoria Ceretti, so your students can suss out the patterns that exist in this data.

If your students get the joke, they get statistics.

Gleaned from multiple sources (FB, Pinterest, Twitter, none of these belong to me, etc.). Remember, if your students can explain why a stats funny is funny, they are demonstrating statistical knowledge. I like to ask students to explain the humor in such examples for extra credit points (see below for an example from my FA14 final exam). Using xkcd.com for bonus points/assessing if students understand that correlation =/= causation What are the numerical thresholds for probability?  How does this refer to alpha? What type of error is being described, Type I or Type II? What measure of central tendency is being described? Dilbert: http://search.dilbert.com/comic/Kill%20Anyone Sampling, CLT http://foulmouthedbaker.com/2013/10/03/graphs-belong-on-cakes/ Because control vs. sample, standard deviations, normal curves. Also,"skewed" pun. If you go to the original website , the story behind this cakes has to do w...