Skip to main content

Posts

Showing posts with the label dirty data

Want to avoid federal regulation and increase profits? Just don't share your data.

TL;DR: One way to avoid government regulation is by simply refusing to share data that may lead to government regulation (and safer trains). I'm looking at you, railroads. _____________________________________________________________________________ Not every example I post syncs directly to the typical Psychological Statistics curriculum. I also post about statistical literacy. Like why data matters and counts. And how very, very simple data could help illuminate and solve real-world problems, but only if we can access that data. I get good and mad at organizations that avoid responsibility by manipulating and/or withholding data.  See: Organizations that  share data but in a functionally inaccessible way. Also, I created a spreadsheet (of course I did) containing several examples of times when large organizations goofed around with data so they wouldn't get sued. It looks like I should add rail roads to this list. Aside: I grew up not 10 miles from the world-famous Horsesho...

Dirty Data: Share the data in a way that is functionally inaccessible

In my intro stats class, we discuss shady data practices that aren't lying because they report actual numbers. But they are still shady because good data is presented in such a way as to be misleading or confusing. These topics include: Truncating the y-axis   Collecting measures of central tendency under ideal circumstances Manipulate online ratings (I didn't write the blog post about this yet, but it is coming). Relative vs. Absolute Risk AND HERE IS ANOTHER ONE: Insurance companies were asked to provide price data  RE: the Transparency in Coverage Rule in the Consolidatedated Appropriations Act of 2021. Google that if you want to know more about that, I'm not going into that. Not my lane. That said, it is an appealing idea. Let's have some transparency in our jacked-up healthcare system. And the insurance companies provided the data, but in a way inaccessible to most people. Like, all people, maybe? Because they just splurted out 100 TB of data. So, they totally com...

How to investigate click-bait survey claims

Michael Hobbes shared a Tweet from Nick Gillespie. That Tweet was about an essay from The Bulwark . That Tweet plays fast and loose with Likert-type scale interpretation. The way Hobbes and his Twitter followers break down the issues with this headline provides a lesson on how to examine suspicious research clickbait that doesn't pass the sniff test. First off, who says "close to one in four"? And why are they evoking the attempt on Salman Rushdie's life, which did not happen on a college campus and is unrelated to high-profile campus protests of controversial speakers?  Hobbes dug into the survey cited in the Bulwark piece. The author of the Bulwark piece interpreted the data by collapsing across response options on a Likert-type response scale. Which can be done responsibly, I think. "Very satisfied" and "satisfied" are both happy customers, right? But this is suspicious. Other Twitter users questioned the question and how it may leave room for i...

We should teach intro stats students about relative vs. absolute risk

Do you know what bugs me? How much time different intro stats textbooks spend talking about probability, lots of A not B stuff*, lots of probability associated with the normal distribution, etc. But we don't take advantage of the discussion to warn their students about the evils of relative vs. absolute risk. #statsliteracy Relative risk is the most clickbaity abuse of statistics that there is. Well, maybe the causal claims based on correlational data are more common. But I think the relative risk is used to straight-up scare people, possibly changing their behaviors and choices. I thought of it most recently when The Daily Mail (bless) used explained the difference in COVID-19 risk between dog owners and non-dog owners .   Here is the data described in the headline, straight from the original paper : Really, Daily Mail? How dare you. I think the most clever, trickiest, sneakiest ways to mislead with data are by not lying with data at all. Most truncated y-axes display actual ...