Skip to main content

Posts

Showing posts with the label validity

NYT American dialect quiz as an example of validity and reliability.

TL:DR: Ameri-centric teaching example ahead: Have your students take this quiz, and the internet will tell them which regions of the US talk the same as them. Use it to teach validity. Longer Version: The NYT created a gorgeous version ( https://www.nytimes.com/interactive/2014/upshot/dialect-quiz-map.html ) of a previously available quiz ( http://www.tekstlab.uio.no/cambridge_survey/ ) that tells the user what version of American English they speak. The prediction is based upon loads and loads of survey data that studies how we talk. It takes you through 25 questions that ask you how you pronounce certain words and which regional words you use to describe certain things. Here are my results: Indeed, I spent elementary school in Northern Virginia, my adolescence in rural Central PA, college at PSU, and I now live in the far NW corner of PA. As this test indeed picked up on where I've lived and talked, I would say that this is a  valid  test based just on my u...

The Tonight Show: Nick Jonas scores as Joe Jonas on Buzzfeed quiz.

Explain validity to your students AND earn some "I'm still hip!" street cred using this The Tonight Show clip that features a Buzzfeed quiz AND exactly one Jonas brother. Nick Jonas took a "Which Jonas Brother are you?" Buzzfeed quiz. He scores as Joe Jonas. Ergo, the Buzzfeed assessment measure is not valid. It does not properly assess what it purports to assess. Watch the video for yourself. If you want to take this example a step further, you could have your students take the original quiz, discuss the questions and their ability to discern which Jonas Brother is which, you could describe Nick Jonas as a Nick Jonas Subject Matter Expert, maybe Social Desirability got in the way of Nick answering the questions honestly, etc. Another thing I've noticed as my blog and I have aged together: There are now generations of Buzzfeed quiz assessments that provide great examples for different age groups: Gen X: Shirley Manson did not score as Shirley ...

Kim Kardashinan-West, Buzzfeed, and Validity

So, I recently shared a post detailing how to use the Cha-Cha Slide in your Intro Stats class. Today? Today, I will provide you with an example of how to use Kim Kardashian to explain test validity. So. Kim Kardashian-West stumbled upon a Buzzfeed quiz that will determine if you are more of a Kim Kardashian-West or more of a Chrissy Teigen . She Tweeted about it, see below. https://twitter.com/KimKardashian/status/887881898805952514 And she went and took the test, BUT SHE DIDN'T SCORE AS A KIM!! SHE SCORED AS A CHRISSY! See below. https://twitter.com/KimKardashian/status/887882791488061441 So, this test purports to assess one's Kim Kardashian-West-ness or one's Chrissy Teigan-ness. And it failed to measure what it claimed to measure as Kim didn't score as a Kim. So, not a valid measure. No word on how Chrissy scored. And if you are in you teach people in their 30s, you could always use this example of the time Garbage's Shirley Manson...

Stromberg and Caswell's "Why the Myers-Briggs test is totally meaningless"

Oh, Myers-Briggs Type Indicator, you unkillable scamp. This video , from Vox, gives a concise historical perspective on the scale, describes how popular it still is, and summarizes several of the arguments against the scale. This video explains why the ol' MBTI is not particularly useful. Good for debunking psychology myths and good for explaining reliability (in particular, test-retest reliability) and validity. I like this link in particular because it presents its argument via both video as well as a smartly formatted website. The text in the website includes links to actual peer-reviewed research articles that refute the MBTI.

Five Lab's Big Five Personality Predictor

Five.com created an app to predict you score on the Big Five by analyzing your FB status updates. five.com's prediction via status update It might be fun to have students use this app to measure their Big Five and then compare those findings to the  youarewhatyoulike.com app ( which I previously discussed on this blog ), which predicts your scores on the Big Five based on what you "Like" on FB. youarewhatyoulike.com's prediction via "Likes" As you can see, my "Likes" indicate that I am calm and relaxed but I am a neurotic status updater (crap...I'm that guy!). By contrasting the two, you could discuss reliability, validity, how such results are affected by social desirability, etc. Furthermore, you could also have your students take the original scale and see how it stacks up to the two FB measures. Note: If you ask your students to do this, they will have to give these apps access to a bunch of their personal informat...

A.V. Club's "Shirley Manson takes BuzzFeed's "Which Alt-Rock Grrrl Are You?" quiz, discovers she's not herself"

Lately, there have been a lot of quizzes popping up on my Facebook feed ("What breed of dog are you?", "What character from Harry Potter are you?"). As a psychologist who tinkers in statistics, I have pondered the psychometric properties of such quizzes and concluded that these quizzes where probably not properly vetted in peer-reviewed journals. Now I have a tiny bit of evidence to support that conclusion. What better way to ensure that a scale is valid than by using the standard of concurrent validity (popular in I/O psychology)? This actually happened when renowned Shirley Manson Subject Matter Expert, Shirley Manson, lead singer of the band Garbage, took the "Which Alt-rock Grrrl are you?" quiz and she didn't score as herself (as she posted on Facebook and reported by A.V. Club ). From Facebook, via A.V. Club An excellent example of an invalid test (or concurrent validity for you I/O types).

Time's "Can Time predict your politics?" by Jonathan Haidt and Chris Wilson

This scale , created by Haidt and Wilson, predicts your political leanings based upon seemingly unrelated questions. Screen grab from time.com You can use this in a classroom to 1) demonstrate interactive, Likert-type scales, 2) face validity (or lack there of). I think this would be 3) useful for a psychometrics class to discuss scale building. Finally, the update at the end of the article mentions 4) both the n-size and the correlation coefficient for their reliability study, allowing you discuss those concepts with students. For more about this research, try yourmorals.org

Washington Posts's "GAO says there is no evidence that a TSA program to spot terrorists is effective" (Update: 3/25/15)

The Travel Security Agency implemented SPOT training in order to teach air port security employees how to spot problematic and potentially dangerous individuals via behavioral cues. This intervention has cost the U.S. government $1 billion+. It doesn't seem to work. By discussing this with your class, you can discuss the importance of program evaluations as well as validity and reliability. The actual government issued report goes into great detail about how the program evaluation data was collected to demonstrate that SPOT isn't working. The findings (especially the table and figure below) do a nice job of demonstrating the lack of reliability and the lack of validity. This whole story also implicitly demonstrates that the federal government is hiring statisticians with strong research methods backgrounds to conduct program evaluations (= jobs for students). Here is a summary of the report from the Washington Post. Here is a short summary and video about the report from ...