Monday, May 29, 2017

Daniel's "Where Slang Comes From"

I think that language is fascinating. Back when I taught developmental, I always liked to teach how babies learn to talk in sort of the same way all across the world. I like regional difference in American English (for example, swearing and regional colloquialisms). So, I really like this research that investigates the rise and fall of slang in America. And I think it could be used in a statistics class.

How to use in class?

1. Funny list of descriptive statistics.

2. Research methodology for using Google searches to answer a question. A good opening for discussion of archival data, data mining, and creating inclusion criteria for research methodology.

3. Using graphs to illustrate trends across time. This feature is interactive.

4. Further interactive features demonstrating how heat maps can be used to demonstrate state-by-state popularity over time. Here, "dank memes" peaked in April 2016 in Montana.

5. The author eye-balled the data can came up with common origins of slang: Hip-hop music, politics, "the internets" (technology). This reminds me, conceptually, of cluster analysis. Note: NO CLUSTER ANALYSIS was conducted to come up with the three slang origin categories.

Monday, May 22, 2017

Trendacosta's Mathematician Boldly Claims That Redshirts Don't Actually Die the Most on Star Trek

io9 recaps a talk given by mathematician James Grime. He addressed the long running Star Trek joke that the first people to die are the Red Shirts. Using resources that detail the ins and outs of Star Trek, he determined that:

This makes for a good example of absolute vs. relative risk. Sure, more red shirts may die, absolutely, but proportionally? They only make up 10% of the deaths. Also, I think this is a funny example of using archival data in order to understand an actual on-going Star Trek joke.

For more math/Star Trek links, go to's treatment of the speech.

Monday, May 15, 2017

Pew Research Center's Methods 101 Video Series

Pew Research Center is an excellent source for data to use in statistics and research methods classes. I have blogged about them before (look under the Label pew-pew!) and I'm excited to share that Pew is starting up a series of videos dedicated to research methods. The new series will be called Methods 101.

The first describes sampling techniques in which weighing is used to adjust imperfect samples as to better mimic the underlying population. I like that this is a short video that focuses on one specific aspect of polling. I hope that they continue this trend of creating very specific videos covering specific topics.

Looking for more videos? Check out Pew's YouTube Channel. Also, I have a video tag for this blog.


They have posted their second video, this one on proper wording for research questions as to avoid jargon and bias.

Monday, May 8, 2017

Daniel's "Most timeless songs of all time"

This article, written by Matt Daniels for The Pudding, allows you to play around with a whole bunch of Spotify user data in order to generate visualizations of song popularity over time. You can generate custom visualizations using the very interactive sections on this website. For instance, there is a special visualization that allows you to finally quantify the Biggie/Tupac Rivalry.

So, data and pop culture are my two favorite things. I could play with these different interactive pieces all day long. But there are also some specific ways you could use this in class.

1) Generate unique descriptive data for different musicians and then ask you students to create visualizations using the software of your choosing. Below, I've queried Dixie Chicks play data. Students could enter their own favorite artist. Note: They data only runs through 2005.

2) Sampling errors: Here is a description of the methodology used for this data:

Is this representative of all data? What does he mean by "normalize the data" as a way to correct the data? Where could we collect data as to have a more representative sampling? Would Sirus skew older? What about iTunes?

3) Using data mining/archival data to generate insights into research questions.

Here, the question explored in this article is, "What is the difference between a flash in the pan song versus a song for the ages?".

Here, data from 2013 hits has been tracked. And it founds that the post-hit plateau is a good indicator of music that will have longer staying power. Here, event though Daft Punk's Get Lucky peaked much higher than Onerepublic's Counting Stars, Counting Starts has a higher plateau. Also, note that with this interactive piece, students could select any number of songs to compare.

Monday, May 1, 2017

"Student life summarized using graphs" video

I found this video at the Student Problems Page on Facebook. I don't know who to attribute it to, but it was probably a smart, sarcastic Intro Stats student.