Monday, November 20, 2017

Math With Bad Drawing's "Why Not to Trust Statistics"

Bad Math with Drawings has graced us with statistical funnies before (scroll down for the causality coefficient). Here is another one, a quick guide pointing out how easy it is to lie with descriptive statistics. Here are two of the examples, there are plenty more at Math With Bad Drawings.

Variance example
https://mathwithbaddrawings.com/2016/07/13/why-not-to-trust-statistics/
https://mathwithbaddrawings.com/2016/07/13/why-not-to-trust-statistics/



Monday, November 13, 2017

Using The Onion to teach t-tests

In the past, I've used fake data based on real research to create stats class examples. Baby names, NICUs, and paired t-test. Pain, surgical recovery, and ANOVA.

Today, I've decided to use fake data and fake research to create a real example for teaching one-sample t-test. It uses this research report from The Onion:

https://www.theonion.com/toddler-scientists-finally-determine-number-of-peas-tha-1820347088


In this press release, the baby scientists claim that the belief that a baby could only smash four peas into their ear canal were false. Based upon new research recommendations, that number has been revised to six. Which sure sounds like a one-sample t-test to me. Four is the mu assumed true based upon previous baby ear research. And the sample data had a mean of 6, and this was statistically significant.

Here is some dummy data that I created that replicates these findings, when mu/test value is set to 4. :

5.00
6.00
7.00
6.00
5.00
6.00
6.00
5.00
7.00
6.00
7.00

Why, yes, I do think I'm pretty clever.

Monday, November 6, 2017

Yau's Real Chart Rules to Follow

The sum of the parts is greater than the whole? Nathan Yau's article on creating readable, useful graphs  is a perfectly reasonable list of how to create a proper graph. The content is sound. So, that's good.

However, the accompanying images and captions are hilarious. They will show your students how to make not-awful charts.







Monday, October 30, 2017

Wilson's "Why Are There So Many Conflicting Numbers on Mass Shootings?"

This example gets students thinking about how we operationalize variables. Psychologists operationalize a lot of abstract stuff. Intelligence. Grit. But what about something that seems more firmly grounded and countable, like whether or not a crime meets the criteria for a a mass shooting?

How do we define mass shooting?

As shared in this article by Chris Wilson for Time Magazine, the official definition is 1) three or more people 2) killed in a public setting. That is per the current federal definition of a mass shooting.

But that isn't universally excepted by media outlets. The article shares different metrics used for identifying a mass shooting, depending on what source is being used. Whether or not to include a dead shooter towards the total number killed. Whether or not the victims were randomly selected.

I think the most glaring example from the article has to do with the difference that this definition makes on mass shooting counts:


You could also discuss with students how they would define it, what parameters would be important, and research questions that might require a firm definition.

Thursday, October 26, 2017

Logical Fallacy Ref Meme

So, I love me some good statsy memes. They make a brief, important point that sticks in the heads of students.

I've recently learned of the Logical Fallacy Ref meme. Here are a couple that apply to stats class:






Friday, October 20, 2017

Climate Central's The First Frost is Coming Later

So, this checks off a couple of my favorite requisites for a good teaching example: You can personalize it, it is contemporary and applicable, it illustrates a few different sorts of statistics. 


The overall article is about how frosty the US is becoming as the Earth warms. They provide data about the first frost is a number of US cities. It even lists my childhood hometown of Altoona, PA, so I think there is a pretty large selection of cities to choose from. Below, I've included the screen grab for my current home, and the home of Gannon University, Erie, PA.

First frost date illustrated with a line chart, but the chart also includes the regression line.

Data for frosty, chilly Erie, PA

The article also presents a chart that shows how frost is related to the length of growing season in the US. This graph is a good example as the y-axis shows the number of days above or below average growing season over time, starting with 1895 and going through 2015.



Both graphs also illustrate long term data collection, and illustrate archival data.



Wednesday, October 11, 2017

Using the Global Terrorism Database's code book to teach levels of measurement, variable types

A database code book is the documentation of all of the data entry rules and coding schemes used in a given data base. And code books usually contain examples of every kind of variable and level of measurement you need to teach your students during the first  two weeks of Intro Stats. You can use any code book  from any database relevant to your own scholarship as an example in class. Or perhaps you can find a code book particularly relevant to the students or majors you are teaching.

Here, I will describe how to use Global Terrorism Database’s code book for this purpose. The Global Terrorism Database is housed at University of Maryland and has been tracking national and international terrorism since 1970 and has collected information on  over 170, 000 attacks. So, the database in and of itself could be useful in class. But, I will focus on just the code book for  now, as I think this example cuts across disciplines and interests as all of our students are aware of terrorism and this particular code book doesn’t contain much technical jargon.

Here are just a few of the examples contained within it: Dichotomous, mutually exclusive response options ( Yes = 1, No = 0) can be found on page 14, in response to the question ”The violent act must be aimed at attaining a political, economic, religious, or social goal.”.



Another dichotomous response is used to indicate whether or not an attack was intended as a suicide attack (p. 26).

You can demonstrate nominal coding with the response options used to identify the country where the attack occured (Brazil = 30, Cambodia = 36, etc., on p. 17, ) or by looking at the coding scheme for different kinds of terrorist attacks (3 = Bombing/Explosion, p. 22).



Ratio scale of measurement is used to enter number of  perpetrators for a given terrorist act (p. 44). You can also discuss what is lost or gained by reporting (and analyzing) the cost of damages in either categorical and ordinal (number represents one of four ranges of dollars) or qualitative and ratio (enter the amount of loss in USD) format (on pages  49 and 50).




This code book and the actual data base also draw attention to attempts to better understand big, scary life problems via systematic data collection and analysis.  We need first responders and law enforcement officers and the bravery of regular citizens thrust into terrible situations in order to deal with terrorist events. We also need sharp, statistical minds to look for the patterns in these attacks in an attempt to prevent future attacks.

For a more advanced statistics class, you can also point out the naming conventions used in this database, and how naming conventions are good practice  and need to be sorted out prior to data collection.

Monday, October 2, 2017

Compound Interest's "A Rought Guide to Spotting Bad Science"

I love good graphic design and lists. This guide to spotting bad science embraces both. And many of the science of bad science are statistical in nature, or involve sketchy methods. Honestly, this could be easily turned into a  homework assignment for research evaluation.

This comes from the Compound Interest (@compoundchem), which has all sorts of beautiful visualizations of chemistry topics, if that is your jam. 



Monday, September 25, 2017

Izadi's "Black Lives Matter and America’s long history of resisting civil rights protesters"

Elahe Izadi, writing for the The Washington Post, recently dug up some old polling data from the 1960s. The data focused on public opinion about different aspects of the civil rights movement (March on Washington, freedom riders, etc.). The old data was used to draw parallels between the mixed support for the Civil Rights Movement of the 1960s and the mixed support for current civil rights protests, specifically, Black Lives Matter.

Here is the Washington  Post story on the polling data, the civil rights movement, and Black Lives Matter. The story is the source of all the visualizations contained below. Here is the original polling data.

https://img.washingtonpost.com/wp-apps/imrs.php?src=https://img.washingtonpost.com/blogs/the-fix/files/2016/04/2300-galluppoll1961-1024x983.jpg&w=1484
https://img.washingtonpost.com/wp-apps/imrs.php?src=https://img.washingtonpost.com/blogs/the-fix/files/2016/04/2300-galluppoll1961-1024x983.jpg&w=1484
https://img.washingtonpost.com/wp-apps/imrs.php?src=https://img.washingtonpost.com/blogs/the-fix/files/2016/04/2300-galluppoll1963-1024x528.jpg&w=1484
https://img.washingtonpost.com/wp-apps/imrs.php?src=https://img.washingtonpost.com/blogs/the-fix/files/2016/04/2300-galluppoll1963-1024x528.jpg&w=1484


I think this is timely data. And it can be used to point out a lot of things:

1) Polling data (and jobs) have been around for a long time: We may be in the era of Big Data, but polling data has been around for a long, long time. And Gallup, the organization that gathered some of this data, has been around for a long, long time and may hire your baby statisticians some day. And statisticians and researchers do more than run analyses all day: They work with clients (here, Gallup worked for Newsweek) to organize massive  data collections in order to better understand social issues.
2) Data can inform just about anything: This data informs our understanding of history. Typically, the two fields seem very separated, but this time, they aren't. This makes it another larger example of how data might be useful to our students outside of the classroom: It can be used to inform debate and enrich our perspective of current events and history.
3) A clean, easy to read example of a stacked bar graph.







Monday, September 18, 2017

Yau's "Divorce and Occupation"

Nathan Yau, writing for Flowing Data, provides a good example of correlation, median, and correlation not equaling causation in his story, "Divorce and Occupation".

Yau looked at the relationship between occupation and divorce in a few ways.

He used one of variation upon the violin plot to illustrate how each occupation's divorce rate falls around the median divorce rate. Who has the lowest rate? Actuaries. They really do know how to mitigate risk. You could also discuss why median divorce rate is provided instead of mean divorce rate. Again, the actuaries deserve attention as they probably would throw off the mean.

https://flowingdata.com/2017/07/25/divorce-and-occupation/

He also looked at  how salary was related to divorce, and this can be used as a good example of a linear relationship: The more money you make, the lower your chances for divorce. And an intuitive exception to that trend? Clergy members. 

https://flowingdata.com/2017/07/25/divorce-and-occupation/


Both scatter plots, when viewed at the website, are interactive. By cursoring over any dot, you can see the actual x- and y-axis data for that point.

Also, if you are teaching more advanced students, Yau shares some information on how he created these scatter plots at the end of the article.

Finally, talk to your students about the Third Variable Problem and how correlation does not equal causation. What is causing the relationship between income and divorce? Is it just money? Is it the sort of hours that people work? How does IQ figure into divorce? Maybe it has something to do with the fact that people who seek advanced degrees tend to get married later in life.

Monday, September 11, 2017

Teach t-tests via "Waiting to pick your baby's name raises the risk for medical mistakes"

So, I am very pro-science, but I have a soft spot in my heart for medical research that improves medical outcomes without actually requiring medicine, expensive interventions, etc. And after spending a week in the NICU with my youngest, I'm doubly fond of way of helping the littlest and most vulnerable among us. One example of such an was published in the journal Pediatrics and written up by NPR. In this case, they found that fewer mistakes are made when not-yet-named NICU babies are given more distinct rather than less distinct temporary names. The unnamed baby issues is an issue in the NICU, as babies can be born very early or under challenging circumstances, and the babies' parents aren't ready to name their kids yet. Traditionally, hospitals would use the naming convention "BabyBoy Hartnett" but several started using "JessicasBoy Hartnett" as part of this intervention. So, distinct first and last names instead of just last names. They measured patient mistakes by counting the number of Retract-and-Reorders, or how often a treatment was listed in a patient’s record, then deleted and assigned to a different patient (due to a mistake being corrected). They found that the number of retract-and-reorders decreased following the naming convention change.

This researcher DID NOT use paired t-tests their analyses. However, this research presents a good conceptual example of within subject t-tests. As I often do around this blog, I created fake t-test data that mimicked the findings with fewer R-and-R's for doubly named babies. The data was created via Richard Lander’s data generator website:

Before intervention After Intervention
NICU 1 47 36
NICU 2 45 26
NICU 3 52 38
NICU 4 50 32
NICU 5 46 42
NICU 6 38 20
NICU 7 63 41
NICU 8 40 27
NICU 9 37 26
NICU 10 40 29


Monday, August 28, 2017

Hedonometer.org

The Hedonometer measures the overall happiness of Tweets on Twitter.

It provides a simple, engaging example for  Intro Stats since the data is graphed over time, color coded for day of the week, and interactive. I think it could also be a much deeper example for a Research Methods class as the "About" section of the website reads like a journal article  methods section, in so much that the Hedonometer creators describe their entire process for rating Tweets.

This is what the basic table looks like. You can drill into the data by picking a year or a day of the week t o highlight. You can also use the sliding scale along the bottom to specify a time period.

The website is also kept very, very up to date, so it is also a very topical resource.

Data for white supremacy attack in VA
Data for white supremacy attack in VA




In the pages "About" section, they address many methodological questions your students might raise about this tool. It is a good example for the process researchers go through when making judgement calls regarding the operationalization of their variables.:

In order to determine the happiness of any given word, they had to score the words. Here are their scores, which they provide:

Photo of word happiness ratings/http://hedonometer.org/words.html
http://hedonometer.org/words.html
They describe how they rated the words, which gives your students an example how to use mTurk in research:
Description of how they rated the individual words: http://hedonometer.org/about.html
http://hedonometer.org/about.html
 They also describe a short coming of the lexical ratings: Good events that are associated with very, very bad events:
Why bin Laden's death received low happiness ratings: http://hedonometer.org/about.html
http://hedonometer.org/about.html
 They also describe their exact sample:
Hedonometer sampling - http://hedonometer.org/about.html
http://hedonometer.org/about.html


Monday, August 21, 2017

Sonnad and Collin's "10,000 words ranked according to their Trumpiness"

I finally have an example of Spearman's rank correlation to share.

This is a political example, looking at how Twitter language usage differs in US counties based upon the proportion of votes that Trump received.

This example was created by Jack Grieves, a linguist who uses archival Twitter data to study how we speak. Previously, I blogged about his work that analyzed what kind of obscenities are used in different zip codes in the US. And he created maps of his findings, and the maps are color coded by the z-score for frequency of each word. So, z-score example.

Southerners really like to say "damn". On Twitter, at least.

But on to the Spearman's example. More recently, he conducted a similar analysis, this time looking for trends in word usage based on the proportion of votes Trump received in each county in the US. NOTE: The screen shots below don't do justice to the interactive graph. You can cursor over any dot to view the word as well as the correlation coefficient. Grieve performed a Spearman's correlation. He ran the correlation by rank ordering 1) the 10,000 most commonly tweeted words and 2) the "level of Trump support in  US counties" was measured as percentage of the vote for Trump (thanks for replying to my email, Jack!), with positive correlations indicating a positive relationship between Trump support and word usage. See below:


Trump supporting counties are going for the soft swears.


And Clinton leaning counties don't give a  f*ck, which may be because they've had one to many beers.

So, there is a lovely, interactive piece that lists words and the correlation coefficient for the relationship between that word and support for Trump. Grieves speculates that this data points to an urban/rural divide in Trump support.

Also of note, the data was collected two years before the election, so no "Bad Hombres", "Snowflakes, "She Persisted", "Winners", etc. showed up  in this data, so it might be a snapshot of the differences that lead up to the current, rather divided electorate.

Monday, August 14, 2017

The Economists' "Ride-hailing apps may help to curb drunk driving"

I think this is a good first day of class example.

It shows how data can make a powerful argument, that argument can be persuasively illustrated via data visualization, AND, maybe, it is a soft sell of a way to keep your students from drunk driving. It also touches on issues of public health, criminal justice, and health psychology.

This article from The Economist succinctly illustrates the decrease in drunk driving incidents over time using graphs.

This article is based on a working paper by PhD student Jessica Lynn (name twin!) Peck.

Graphs of drunk driving accidents x time
https://cdn.static-economist.com/sites/default/files/imagecache/640-width/20170408_WOC328_2.png
Also, maybe your students could brainstorm third variables that could explain the change. Also, New Yorkers: What's the deal with Staten Island? Did they outlaw Uber? Love drunk driving? 

Monday, August 7, 2017

Kim Kardashinan-West, Buzzfeed, and Validity

So, I recently shared a post detailing how to use the Cha-Cha Slide in your Intro Stats class.

Today? Today, I will provide you with an example of how to use Kim Kardashian to explain test validity.




So. Kim Kardashian-West stumbled upon a Buzzfeed quiz that will determine if you are more of a Kim Kardashian-West or more of a Chrissy Teigen. She Tweeted about it, see below.
https://twitter.com/KimKardashian/status/887881898805952514

And she went and took the test, BUT SHE DIDN'T SCORE AS A KIM!! SHE SCORED AS A CHRISSY! See below.


https://twitter.com/KimKardashian/status/887882791488061441

So, this test purports to assess one's Kim Kardashian-West-ness or one's Chrissy Teigan-ness. And it failed to measure what it claimed to measure as Kim didn't score as a Kim. So, not a valid measure. No word on how Chrissy scored.

And if you are in you teach people in their 30s, you could always use this example of the time Garbage's Shirley Manson did not score as Shirley Manson on an online quiz. 

Monday, July 31, 2017

Hickey's "The Ultimate Playlist Of Banned Wedding Songs"

I think this blog just peaked. Why? I'm giving you a way to use the Cha-Cha-Slide ("Everybody clap your hands!") as a tool to teach basic descriptive statistics.



Most Intro Stats teachers could use this within the first week of class, to describe rank order data, interval data, qualitative data, quantitative data, the author's choice of percentage frequency data instead of straight frequency.

Additionally, Hickey, writing for fivethirtyeight, surveyed two dozen wedding DJs about banned songs at 200 weddings. So, you can chat about research methodology as well. 

Finally, as a Pennsylvanian, it makes me so sad that people ban the Chicken Dance! How can you possibly dislike the Chicken Dance enough to ban it? Is this a class thing? 

Monday, July 24, 2017

de Frieze's "‘Replication grants’ will allow researchers to repeat nine influential studies that still raise questions"

In my stats classes, we talk about the replication crisis. When introducing the topic, I use this reading from NOBA. I think it is also important for my students to think about how science could create an environment where replication is more valued. And the Dutch Organization for Scientific Research has come up with a solution: It is providing grants to nine groups to either 1) replicate famous findings or 2) reanalyze famous findings. This piece from Science details their efforts.

The Dutch Organization for Scientific Research provides more details on the grant recipients, which include several researchers replicating psychology findings:



How to use in class: Again, talk about the replication crisis. Ask you students to generate ways to make replication more valued. Then, give them a bit of faith in psychology/science by sharing this information on how science is on it. From a broader view, this could introduce the idea of grants to your undergraduates or get your graduate students thinking about new avenues for getting their replications funded.

Monday, July 17, 2017

Harris's "Scientists Are Not So Hot At Predicting Which Cancer Studies Will Succeed"

This NPR story is about reproducibility in science that ISN'T psychology, the limitations of expert intuition, and the story is a summary of a recent research article from PLOS Biology (so open science that isn't psychology, too!).

Thrust of the story: Cancer researchers may be having a similar problem to psychologists in terms of replication. I've blogged this issue before. In particular, concerns with replication in cancer research, possibly due to the variability with which lab rats are housed and fed.

So, this story is about a study in which 200 cancer researchers, post-docs, and graduate students took a look at six pre-registered cancer study replications and guessed which studies would successfully replicate. And the participants systematically overestimated the likelihood of replication. However, researchers with high h-indices, were more accurate that the general sample. I wonder if the high h-indicies uncover super-experts or super-researchers who have been around the block and are a bit more cynical about the ability of any research finding to replicate.

How to use in a stats class: False positives: The original research didn't replicate (this time, maybe) AND that the experts judging replicability were overly optimistic. Also, one might wonder if there are potential cancer treatments that we don't know about because of false negatives.

How to use in a research class: The lack of reproduction may signal evidence of the publication bias. Replication is necessary for good science. Experts aren't perfect.


Monday, July 10, 2017

Domonoske's "50 Years Ago, Sugar Industry Quietly Paid Scientists To Point Blame At Fat"

This NPR story discusses research detective work published JAMA. The JAMA article looked at a very influential NEJM review article that investigated the link between diet and Coronary Heart Disease. Specifically, whether sugar or fat contribute more to CHD. The article, written by Harvard researchers decades ago, pinned CHD on fatty diets. But the researchers took money from Big Sugar (which sounds like...a drag queen or CB handle) and communicated with Big Sugar while writing the review article.

This piece discusses how conflict of interest shaped food research and our beliefs about the causes of CHD for decades. And how conflict of interest and institutional/journal prestige shaped this narrative. It also touches on how industry, namely sugar interests, discounted research that finds a sugar:CHD link while promoting and funding research that finds a fat:CHD link.

How to use in a Research Methods class:
-Conflict of interest. The funding received by the researchers from the sugar lobby was never fully disclosed. Sugar lobby communicated with the authors of the original research while they were writing the review article.
-Article of ill repute was a literature review. Opens up the conversation on how influential review papers are. Especially when the authors are from well-reputed institutions and they are printed in well-reputed journals.
-A good example of cherry picking data. Articles critical of sugar where held to a different standard.
-I am a psychologist. I discuss the replication crisis in psychology, but other fields (here, nutrition and heart diseaseresearch) are susceptible to zeitgeist as well.

Monday, July 3, 2017

Chris Wilson's "The Ultimate Harry Potter Quiz: Find Out Which House You Truly Belong In"

Full disclosure: I have no chill when it comes to Harry Potter.

Despite my great bias, I still think this pscyometrically-created (with help from psychologists and Time Magazine's Chris Wilson!) Hogwart's House Sorter is a great example for scale building, validity, descriptive statistics, electronic consent, etc. for stats and research methods.

How to use in a Research Methods class:

1) The article details how the test drew upon the Big Five inventory. And it talks smack about the Myers-Briggs.


2) The article also uses simple language to give a rough sketch of how they used statistics to pair you with your house. The "standard statistical model" is a regression line, the "affinity for each House is measured independently", etc.



While you are taking the quiz itself, there are some RM/statsy lessons:

3) At the end of the quiz, you are asked to contribute some more information. It is a great example of a leading response options as well as implied, electronic consent.


4) The quiz provides descriptive statistics of how well you fit into each House:


5) There is a debriefing:


This isn't the first time I've posted about Chris Wilson's statsy interactive pieces for Time magazine.

Teach Least Squared Error, trends over time, archival data sets via this feature that finds the British equivalent of your first name based on the popularity of your name when you were born versus the same ranked name in England. Bonus: Your students can find out their British name. Mine is Shannon.

Teach percentiles, medians, and I/O's Holland Inventory with this data investigating the relationship between job salary AND Holland personality match for the job. Spoiler alert: This data also provides an example of a non-significant correlation. Bonus: Your students can find out their own Holland Inventory type.

Monday, June 26, 2017

APA's "How to Be A Wise Consumer of Psychological Research"

This is a nice, concise hand out from APA that touches on the main points for evaluating research. In particular, research that has been distilled by science reporters.

It may be a bit light for a traditional research methods class, but I think it would be good for the research methods section of most psychology electives, especially if your students working through source materials.

The article mostly focuses on evaluating for proper sampling techniques. They also have a good list of questions to ask yourself when evaluating research:



This also has an implicit lesson of introducing the APA website to psychology undergraduates and the type of information shared at APA.org. (including, but not limited to, this glossary of psychology terms.)

Monday, June 19, 2017

Winograd's Personality May Change When You Drink, But Less Than You Think

How much do our personalities change when we're drunk? Not as much as we think. We know this due to the self-sacrificing research participants who went to a lab, filled out some scales, got drunk with their friends. For science!

Here is the research, as summarized by the first authorHere is the original study.

This example admittedly panders to undergraduates. But I also think it is an example that will stick in their heads. It provides good examples of:

1) Self-report vs. other-report personality data in research.
-Two weeks prior to the drinking portion, participants completed a Big Five personality scale as if they were drunk. So, there is the self-report of Drunk!Participant. And during the drinking session, participants had their Big Five judged by research assistants coding their interactions with friends, allowing a more object judgment of the Drunk!Participant.

The findings:

https://www.psychologicalscience.org/news/releases/personality-may-change-when-you-drink-but-less-than-you-think.html#.WUP0o-vyvIV


-Why do we need self and other reports? What sort of traits are people most likely to lie about? This could also open up a conversation about Lie scales, especially their use in situations when their is pressure to present well, like during job interviews.

-What other sort of other-reports have your students seen used in research? I've seen research that asks teachers to evaluate students, parents to evaluate children, etc. When might an acquaintance be a better source of data than a stranger?

2) Conceptual examples of repeated measure/within subject t-test and paired-participant/between subjects t-test.

-At Time 1, Ps reported their personality under normal circumstances, and what they think think of their personalities when drunk. Within-subject t-test. Results: Ps believe that their personalities change substantially when drunk.

-At Time 2, while the participants were drunk, they were observed by research assistants. The research assistants made their best guesses at the Ps Big Five. Between-subject, matched t-test. Results: P extroversion seems to increase, but raters didn't find any other increases.

3) Example of using the Big Five in research.


Monday, June 5, 2017

Brenner's "These Hilariously Bad Graphs Are More Confusing Than Helpful"

Brenner, writing for Distractify, has compiled a very healthy list of terrible, terrible graphs and charts. How to use in class:
1) Once you know how NOT to do something, you know how to do it.
2) Bonus points for pointing out the flaws in these charts...double bonus points for creating new charts that correct the incorrect charts.

A few of my favorites: