Skip to main content

Sonnad and Collin's "10,000 words ranked according to their Trumpiness"

I finally have an example of Spearman's rank correlation to share.

This is a political example, looking at how Twitter language usage differs in US counties based upon the proportion of votes that Trump received.

This example was created by Jack Grieves, a linguist who uses archival Twitter data to study how we speak. Previously, I blogged about his work that analyzed what kind of obscenities are used in different zip codes in the US. And he created maps of his findings, and the maps are color coded by the z-score for frequency of each word. So, z-score example.

Southerners really like to say "damn". On Twitter, at least.

But on to the Spearman's example. More recently, he conducted a similar analysis, this time looking for trends in word usage based on the proportion of votes Trump received in each county in the US. NOTE: The screen shots below don't do justice to the interactive graph. You can cursor over any dot to view the word as well as the correlation coefficient. Grieve performed a Spearman's correlation. He ran the correlation by rank ordering 1) the 10,000 most commonly tweeted words and 2) the "level of Trump support in  US counties" was measured as percentage of the vote for Trump (thanks for replying to my email, Jack!), with positive correlations indicating a positive relationship between Trump support and word usage. See below:


Trump supporting counties are going for the soft swears.


And Clinton leaning counties don't give a  f*ck, which may be because they've had one to many beers.

So, there is a lovely, interactive piece that lists words and the correlation coefficient for the relationship between that word and support for Trump. Grieves speculates that this data points to an urban/rural divide in Trump support.

Also of note, the data was collected two years before the election, so no "Bad Hombres", "Snowflakes, "She Persisted", "Winners", etc. showed up  in this data, so it might be a snapshot of the differences that lead up to the current, rather divided electorate.

Comments

Popular posts from this blog

Ways to use funny meme scales in your stats classes

Have you ever heard of the theory that there are multiple people worldwide thinking about the same novel thing at the same time? It is the multiple discovery hypothesis of invention . Like, multiple great minds around the world were working on calculus at the same time. Well, I think a bunch of super-duper psychology professors were all thinking about scale memes and pedagogy at the same time. Clearly, this is just as impressive as calculus. Who were some of these great minds? 1) Dr.  Molly Metz maintains a curated list of hilarious "How you doing?" scales.  2) Dr. Esther Lindenström posted about using these scales as student check-ins. 3) I was working on a blog post about using such scales to teach the basics of variables.  So, I decided to create a post about three ways to use these scales in your stats classes:  1) Teaching the basics of variables. 2) Nominal vs. ordinal scales.  3) Daily check-in with your students.  1. Teach your students the basics...

Using pulse rates to determine the scariest of scary movies

  The Science of Scare project, conducted by MoneySuperMarket.com, recorded heart rates in participants watching fifty horror movies to determine the scariest of scary movies. Below is a screenshot of the original variables and data for 12 of the 50 movies provided by MoneySuperMarket.com: https://www.moneysupermarket.com/broadband/features/science-of-scare/ https://www.moneysupermarket.com/broadband/features/science-of-scare/ Here is my version of the data in Excel format . It includes the original data plus four additional columns (so you can run more analyses on the data): -Year of Release -Rotten Tomato rating -Does this movie have a sequel (yes or no)? -Is this movie a sequel (yes or no)? Here are some ways you could use this in class: 1. Correlation : Rotten Tomato rating does not correlate with the overall scare score ( r = 0.13, p = 0.36).   2. Within-subject research design : Baseline, average, and maximum heart rates are reported for each film.   3. ...

Andy Field's Statistics Hell

Andy Field is a psychologist, statistician, and author. He created a funny, Dante's Inferno-themed  web site that contains everything you ever wanted to know about statistics. I know, I know, you're thinking, "Not another Dante's Inferno themed statistics web site!". But give this one a try. Property of Andy Field. I certainly can't take credit for this. Some highlights: 1) The aesthetic is priceless. For example, his intermediate statistics page begins with the introduction, "You will experience the bowel-evacuating effect of multiple regression, the bone-splintering power of ANOVA and the nose-hair pulling torment of factor analysis. Can you cope: I think not, mortal filth. Be warned, your brain will be placed in a jar of cerebral fluid and I will toy with it at my leisure." 2) It is all free. Including worksheets, data, etc. How amazing and generous. And, if you are feeling generous and feel the need to compensate him for the website, ...