Monday, September 18, 2017

Yau's "Divorce and Occupation"

Nathan Yau, writing for Flowing Data, provides a good example of correlation, median, and correlation not equaling causation in his story, "Divorce and Occupation".

Yau looked at the relationship between occupation and divorce in a few ways.

He used one of variation upon the violin plot to illustrate how each occupation's divorce rate falls around the median divorce rate. Who has the lowest rate? Actuaries. They really do know how to mitigate risk. You could also discuss why median divorce rate is provided instead of mean divorce rate. Again, the actuaries deserve attention as they probably would throw off the mean.

He also looked at  how salary was related to divorce, and this can be used as a good example of a linear relationship: The more money you make, the lower your chances for divorce. And an intuitive exception to that trend? Clergy members.

Both scatter plots, when viewed at the website, are interactive. By cursoring over any dot, you can see the actual x- and y-axis data for that point.

Also, if you are teaching more advanced students, Yau shares some information on how he created these scatter plots at the end of the article.

Finally, talk to your students about the Third Variable Problem and how correlation does not equal causation. What is causing the relationship between income and divorce? Is it just money? Is it the sort of hours that people work? How does IQ figure into divorce? Maybe it has something to do with the fact that people who seek advanced degrees tend to get married later in life.

Monday, September 11, 2017

Teach t-tests via "Waiting to pick your baby's name raises the risk for medical mistakes"

So, I am very pro-science, but I have a soft spot in my heart for medical research that improves medical outcomes without actually requiring medicine, expensive interventions, etc. And after spending a week in the NICU with my youngest, I'm doubly fond of way of helping the littlest and most vulnerable among us. One example of such an was published in the journal Pediatrics and written up by NPR. In this case, they found that fewer mistakes are made when not-yet-named NICU babies are given more distinct rather than less distinct temporary names. The unnamed baby issues is an issue in the NICU, as babies can be born very early or under challenging circumstances, and the babies' parents aren't ready to name their kids yet. Traditionally, hospitals would use the naming convention "BabyBoy Hartnett" but several started using "JessicasBoy Hartnett" as part of this intervention. So, distinct first and last names instead of just last names. They measured patient mistakes by counting the number of Retract-and-Reorders, or how often a treatment was listed in a patient’s record, then deleted and assigned to a different patient (due to a mistake being corrected). They found that the number of retract-and-reorders decreased following the naming convention change.

This researcher DID NOT use paired t-tests their analyses. However, this research presents a good conceptual example of within subject t-tests. As I often do around this blog, I created fake t-test data that mimicked the findings with fewer R-and-R's for doubly named babies. The data was created via Richard Lander’s data generator website:

Before intervention After Intervention
NICU 1 47 36
NICU 2 45 26
NICU 3 52 38
NICU 4 50 32
NICU 5 46 42
NICU 6 38 20
NICU 7 63 41
NICU 8 40 27
NICU 9 37 26
NICU 10 40 29

Monday, August 28, 2017

The Hedonometer measures the overall happiness of Tweets on Twitter.

It provides a simple, engaging example for  Intro Stats since the data is graphed over time, color coded for day of the week, and interactive. I think it could also be a much deeper example for a Research Methods class as the "About" section of the website reads like a journal article  methods section, in so much that the Hedonometer creators describe their entire process for rating Tweets.

This is what the basic table looks like. You can drill into the data by picking a year or a day of the week t o highlight. You can also use the sliding scale along the bottom to specify a time period.

The website is also kept very, very up to date, so it is also a very topical resource.

Data for white supremacy attack in VA
Data for white supremacy attack in VA

In the pages "About" section, they address many methodological questions your students might raise about this tool. It is a good example for the process researchers go through when making judgement calls regarding the operationalization of their variables.:

In order to determine the happiness of any given word, they had to score the words. Here are their scores, which they provide:

Photo of word happiness ratings/
They describe how they rated the words, which gives your students an example how to use mTurk in research:
Description of how they rated the individual words:
 They also describe a short coming of the lexical ratings: Good events that are associated with very, very bad events:
Why bin Laden's death received low happiness ratings:
 They also describe their exact sample:
Hedonometer sampling -