Monday, August 28, 2017

The Hedonometer measures the overall happiness of Tweets on Twitter.

It provides a simple, engaging example for  Intro Stats since the data is graphed over time, color coded for day of the week, and interactive. I think it could also be a much deeper example for a Research Methods class as the "About" section of the website reads like a journal article  methods section, in so much that the Hedonometer creators describe their entire process for rating Tweets.

This is what the basic table looks like. You can drill into the data by picking a year or a day of the week t o highlight. You can also use the sliding scale along the bottom to specify a time period.

The website is also kept very, very up to date, so it is also a very topical resource.

Data for white supremacy attack in VA
Data for white supremacy attack in VA

In the pages "About" section, they address many methodological questions your students might raise about this tool. It is a good example for the process researchers go through when making judgement calls regarding the operationalization of their variables.:

In order to determine the happiness of any given word, they had to score the words. Here are their scores, which they provide:

Photo of word happiness ratings/
They describe how they rated the words, which gives your students an example how to use mTurk in research:
Description of how they rated the individual words:
 They also describe a short coming of the lexical ratings: Good events that are associated with very, very bad events:
Why bin Laden's death received low happiness ratings:
 They also describe their exact sample:
Hedonometer sampling -

Monday, August 21, 2017

Sonnad and Collin's "10,000 words ranked according to their Trumpiness"

I finally have an example of Spearman's rank correlation to share.

This is a political example, looking at how Twitter language usage differs in US counties based upon the proportion of votes that Trump received.

This example was created by Jack Grieves, a linguist who uses archival Twitter data to study how we speak. Previously, I blogged about his work that analyzed what kind of obscenities are used in different zip codes in the US. And he created maps of his findings, and the maps are color coded by the z-score for frequency of each word. So, z-score example.

Southerners really like to say "damn". On Twitter, at least.

But on to the Spearman's example. More recently, he conducted a similar analysis, this time looking for trends in word usage based on the proportion of votes Trump received in each county in the US. NOTE: The screen shots below don't do justice to the interactive graph. You can cursor over any dot to view the word as well as the correlation coefficient. Grieve performed a Spearman's correlation. He ran the correlation by rank ordering 1) the 10,000 most commonly tweeted words and 2) the "level of Trump support in  US counties" was measured as percentage of the vote for Trump (thanks for replying to my email, Jack!), with positive correlations indicating a positive relationship between Trump support and word usage. See below:

Trump supporting counties are going for the soft swears.

And Clinton leaning counties don't give a  f*ck, which may be because they've had one to many beers.

So, there is a lovely, interactive piece that lists words and the correlation coefficient for the relationship between that word and support for Trump. Grieves speculates that this data points to an urban/rural divide in Trump support.

Also of note, the data was collected two years before the election, so no "Bad Hombres", "Snowflakes, "She Persisted", "Winners", etc. showed up  in this data, so it might be a snapshot of the differences that lead up to the current, rather divided electorate.

Monday, August 14, 2017

The Economists' "Ride-hailing apps may help to curb drunk driving"

I think this is a good first day of class example.

It shows how data can make a powerful argument, that argument can be persuasively illustrated via data visualization, AND, maybe, it is a soft sell of a way to keep your students from drunk driving. It also touches on issues of public health, criminal justice, and health psychology.

This article from The Economist succinctly illustrates the decrease in drunk driving incidents over time using graphs.

This article is based on a working paper by PhD student Jessica Lynn (name twin!) Peck.

Graphs of drunk driving accidents x time
Also, maybe your students could brainstorm third variables that could explain the change. Also, New Yorkers: What's the deal with Staten Island? Did they outlaw Uber? Love drunk driving? 

Monday, August 7, 2017

Kim Kardashinan-West, Buzzfeed, and Validity

So, I recently shared a post detailing how to use the Cha-Cha Slide in your Intro Stats class.

Today? Today, I will provide you with an example of how to use Kim Kardashian to explain test validity.

So. Kim Kardashian-West stumbled upon a Buzzfeed quiz that will determine if you are more of a Kim Kardashian-West or more of a Chrissy Teigen. She Tweeted about it, see below.

And she went and took the test, BUT SHE DIDN'T SCORE AS A KIM!! SHE SCORED AS A CHRISSY! See below.

So, this test purports to assess one's Kim Kardashian-West-ness or one's Chrissy Teigan-ness. And it failed to measure what it claimed to measure as Kim didn't score as a Kim. So, not a valid measure. No word on how Chrissy scored.

And if you are in you teach people in their 30s, you could always use this example of the time Garbage's Shirley Manson did not score as Shirley Manson on an online quiz.