Skip to main content

Posts

Showing posts from 2019

Data used by historians to defend tobacco companies

I love data-informed opinions and arguments. So, I was fascinated when NPR told me that some academics quietly take side gigs in which they use data to help tobacco companies. Specifically, tobacco companies argue that, over time, people have become more and more aware of the risks associated with smoking. As such, Big Tobacco argues that they should not be held responsible for the harm caused by smoking. From NPR: I went down the rabbit hole to find the original data and more information on Gallups position, and this is what I found: https://news.gallup.com/poll/1717/tobacco-smoking.aspx So, while American's had heard about the potential connection between cancer and smoking, not everyone believed that this was true (41%), and many people weren't sure about the link (29%). How to use in class: -Data used in court. -Data is used by historians. More here:  http://www.stat.columbia.edu/~gelman/stuff_for_blog/Ethics-of-Consulting-for-the-Tobacco-Industry.p...

Data controversies: A primer

I teach many, many statistics classes. In addition to the core topics typically covered in Introductory Statistics, I think covering real-life controversies involving statistics is vital. Usually, these are stories of large organizations that attempted to bias/PR attack/skew/p-hack/cherry-pick data to serve their own purposes.  I believe that these examples serve to show why data literacy is so critical because data is used in so many fields, AND our students must prepare themselves to evaluate data-based claims throughout their lives. I put out a call on Twitter , and my friends there helped me generate a great list of such controversies. I put this list into a spreadsheet with links to primers on each topic. This isn't an in-depth study of any of these topics, but the links should get you going in the right direction if you would like to use them in class. I hope this helps my fellow stats teachers integrate more applied examples into their classes. If you h...

Pew Research Datasets

Create an account with Pew Research, and you can download some of their data sets, including a) syntax files, b) detailed methodology, and c) codebook, including detailed screenshots of what the survey felt like to participants.  I think there are three ways to use this in class: -Show your students what proper data documentation looks like -Get some data, run some analyses -Get some data, look up Pew's reports based on the data, see if you can replicate the findings. How to Properly Document Your Research Process. Pew documents the hell out of these data sets. Included are: Syntax files: Methodology: Surveys, featuring the questions but also screenshots of the user experience: Get some data, run some analyses. MY FIRST EVER FACTOR ANALYSIS EXAMPLE, y'all. Per the methodology documentation, Pew creates its own scales. Within this data set (American Trends Panel Wave 34), they use several scales to measuring attitudes about medical treatments. ...

Dennis Quaid, the ages of his wives, and regression

This hilarious quip made me think of regression.   So I created a wee data set ( available here ): It features this scatter plot of the data (r = .99). It also includes JASP output of the regression for this data (a person born in 2020 is predicted to marry Dennis Quaid in 2052). 

Judge strikes down Florida ballot law listing candidates from governor’s party first

I love court cases that hinge on statistics, like these two US Supreme Court cases: Hall vs. Florida , Brown vs. Entertainment Merchants Association . Such examples demonstrate the relevance of what students are learning in our class: in Hall vs. Florida, the margin of error saved a criminal from the death penalty. The majority opinion in Brown vs. Entertainment Merchants Association reiterates that correlation does not equal causation and brings up effect sizes. A recent case in Florida demonstrated that research about voting and candidate order on ballots can unfairly advantage candidates at the top of the list. Here is a brief summary from the Miami Herald : https://www.miamiherald.com/news/politics-government/state-politics/article237417779.html Here are portions of the actual decision from the Election Law Blog . The highlight in the paragraph below in mine, since the primacy effect is also something we talk about in Intro Psych. Also, note the terrific footnote....

Pew Research compares forced-choice versus check-all response options.

This is for my psychometric instructors. (Glorious, beloved) Pew Research Center compared participant behavior when they have to answer the same question in either a) forced-choice or b) check-all format. Here are the links to the short report and to the long report . What did they find? Response options matter, such that more participants agreed with statements when they were in the forced-choice format. See below: So, this is interesting for an RM class. I also like that the short report explained the two different kinds of question responses. The article also explores a variety of reasons for these findings, as well as other biases that participants exhibit when responding to questionnaires:

"The Quest To Create A Better Spy-Catching Algorithm"

"(Algorithms) are used so heavily, they don't just predict the future, they are the future." -Cathy O'Neil ^This quote from this NPR story made me punch the air in my little Subaru after dropping my kid off to school. What a great sentence. There are many great one-liners in this little five-minute review of algorithms. This NPR story by Dina Temple-Raston is a great primer for All The Ethical Issues Related To Algorithms, accessible to non- or novice-statisticians. It clocks in at just under five minutes, perfect as a discussion prompt or quick introduction to the topic. How to use in class: They talk about regression without ever saying "regression": "Algorithms use past patterns of success to predict the future." So, regression, right? Fancy regression, but that one line can take this fancy talk of algorithms and make it more applicable to your students. Sometimes, I feel like I'm just waving my hands when I try to explain thi...

Freakanomics Radio's "America's Math Curriculum Doesn't Add Up"

"I believe that we owe it to our children to prepare them for a world they will encounter, a world driven by data. Basic data fluency is a requirement, not just for most good jobs, but for navigating life more generally." -Steven Levitt Preach it, Steve. This edition of the Freakonomics podcast featured guest host Steven Levitt. He dedicated his episode to providing evidence for an overhaul of America's K-12 math curriculum. He argues that our kids need more information on data fluency. I'm not one to swoon over a podcast dedicated to math curriculums, but this one is about the history of how we teach math, the realities of the pressures our teachers face, and solutions. It is fascinating. You need to sit and listen to the whole thing, but here are some highlights: Our math curriculum was designed to help America fight the Space Race (yes, the one back in the 1960s). For a world without calculators. And not much has changed. Quick idea for teaching regr...

Planet Money's The Modal American

While teaching measures of central tendency in Intro stats, I have shrugged and said: "Yeah, mean and average are the same thing, I don't know why there are two words. Statisticians say mean so we'll say mean in this class." I now have a better explanation than that non-explanation, as verbalized by this podcast: The average is thrown around colloquially and can refer to mode, while mean can always be defined with a formula. This is a fun podcast that describes mode vs. mean, but it also describes the research the rabbit hole we sometimes go down when a seemingly straightforward question becomes downright intractable. Here, the question is: What is the modal American? The Planet Money Team, with the help of FiveThirtyEight's Ben Castlemen, eventually had to go non-parametric and divide people into broad categories and figure out which category had the biggest N. Here is the description of how they divided up : And, like, they had SO MANY CELLS in their des...

Pedagogy article recommendation: "Introducing the new statistics in the classroom."

I usually blog about funny examples for the teaching of statistics, but this example is for teachers teaching statistics. Normile, Bloesch, Davoli, & Scheer's recent publication, "Introducing the new statistics in the classroom" (2019) is very aptly and appropriately titled. It is a rundown on p-values and effect sizes and confidence intervals. Such reviews exist elsewhere, but this one is just so short and precise. Here are a few of the highlights: 1) The article concisely explains what isn't great or what is frequently misunderstood about NHST. 2) Actual guidelines for how to explain it in Psychological Statistics/Introduction to Statistics, including ideas for doing so without completely redesigning your class. 3) It also highlights one of the big reasons that I am so pro-JASP: Easy to locate and use effect sizes.

Hurricane Confidence Intervals

UPDATE 10/5/22: No paywall article that conveys the same information:  https://www.msn.com/en-us/weather/topstories/cone-of-confusion-why-some-say-iconic-hurricane-map-misled-floridians/ar-AA12Bqyp Did you know that hurricane prediction maps are confidence intervals? This is one of my examples that serves more as a metaphor than a concrete explanation for a statistic, so bear with me. The New York Times created a beautiful, interactive website (it looked exceptionally sharp on my phone). The website attempts to explain what hurricane prediction maps tell us, versus how people interpret hurricane prediction maps. The website is at NYT, so you probably will hit a paywall if you have already viewed three stories on the NYT website in the last month. As such, I've included screenshots here. Here is a map with the projected hurricane path. People think that the white line indicates where the hurricane will go, and the red indicates bad weather. They also think that the broader path...

Transforming your data: A historical example

TL:DR: Global water temperature data from <1940 was collected by sailors collecting buckets of water from the ocean and recording the temperature of their bucket water. But some recorded data was rounded (thanks, Air Force!). Then, researchers had to transform their data. ^Go to the 3 minute mark to see the bucket-boat-water-temperature technique in action Here is the original research,  published in Nature . NPR covered the research article . Reporter Rebecca Hersher didn't discuss the entire research paper. Instead, she told the story of how the researchers discovered and corrected for their flawed ocean water temperature data. This story might be a little beyond Intro Stats, but it tells the story of messy, real archival data used to inform global climate change and b) introduces the idea of data transformations. Below, I will highlight some of the teaching items. Systematic bias: The data were all flawed in the same way as they were transcribed without any da...

CNN's The most effective ways to curb climate change might surprise you

CNN created an interactive quiz that will teach your students about a) making personal changes to support the environment, b) rank-order data, and c) nominal data. https://www.cnn.com/interactive/2019/04/specials/climate-change-solutions-quiz/ The website leads users through a quiz. For eight categories of environmental crisis solutions, you are asked to rank solutions by their effectiveness. Here are the instructions: Notice the three nominal categories for each solution: What you can do, What industries can do, What policymakers can do. Below, I've highlighted these data points for each of the "Our home and cities" solutions. There are also many, many examples of ordinal data. For each intervention category, the user is presented with several solutions and they must reorder the solutions from most to least effective. How the page looks when you are presented with solutions to rank order: The website then "grades" your respons...

Mother Jones' mass shooting database

Mother Jones' magazine maintains a database of mass shooting events in the United States. 25 variables are collected from every shooting MJs collects 25 variables from every shooting. Below, I've included their own description of the purpose of their database: How to use in class: Within this data is an example for every test we teach in Introduction to Statistics. Correlation/Regression Fatalities Injuries Age of shooter Year of shooting Chi-square Shooter gender Shooter ethnicity Mass or Spree shooting Were the weapons obtained legally? ANOVA Shooter ethnicity T-test Mass or Spree shooting Were the weapons obtained legally? Data Cleaning  Some of these columns need some work before analysis. For instance, there are multiple weapons listed under "Weapon Type". Which is reasonable, but not helpful for descriptive data. You could walk your students through the process of recoding that column into multiple columns. You could also expl...

Passion driven statistics

Passion-Driven Statistics is a grant-funded, FREE resource that teaches the basic of statistics, including the basics of all of the stuff you need to know to conduct good research (data management, literature review, etc.). It bills itself as "project-driven" and is super, duper applied, which is an approach I love. You can download the whole stinking book  or view it online. And the PDF is concise and short, given the amount of material it covers. Why so short? Because it is lousy with links to Youtube videos, mini-assignments, instructions for reporting different statistical tests, etc.  I also love this resource because it contains a lot of good information for novices that I haven't seen packaged this way or in one place: Important lessons pertaining to the research process and data collection: The book is written to take you through a research project, and includes guidance for performing a literature review, writing a sound codebook, data management, etc. ...

The Washington Post, telling the story of the opioid crisis via data

I love dragging on bad science reporting as much as anyone, but I must give All Of The Credit to the Washington Post and its excellent, data-centered reporting on the opioid epidemic . It is a thing of beauty. How to use in class: 1) Broadly, this is a fine example of using data to better understand applied problems, medical problems, drug problems, etc. 2) Specifically, this data can be personalized to your locale via WaPo's beautiful, functional website . 3) After you pull up you localized data, descriptive data abound...# of pills, who provided them, who wrote the scripts (y'all...Frontier Pharmacy is like two miles from my house)...   4) Everyone teaches about frequency tables, right? Here is a good example: 5) In addition to localizing this research via the WaPo website, you can also personalize your class by looking for local reporting that uses this data. For instance, the Erie newspaper reporter David Bruce reported on our local problem ( .pdf of the...

Seagull thievery deterrent research provides blog with paired t-test example.

I have spent many a summer day at Rehoboth Beach, DE. The seagulls there were assholes. They would aggressively go after food, especially your bucket of Thrasher's french fries. Apparently, this is a global problem, as a group of stalwart researchers in the UK attempted to dissuade gulls from stealing french fries by staring those sons-of-a-gun down . Researchers Goumas, Burns, Kelley, and Boogert shared their data . And it makes for a nice t-test example. 1. The Method section is hilarious and true. 2. Within-subject design: Each seagull was observed in the stare down and non-staredown condition 3. Their figure is a nice example of the data visualization trend of illustrating individual data points. 4. The researchers shared their data. You can download it here . The Goumas et al. supplemental data can be used as a paired t-test example, t (18) = 3.13, p = .006, d = 0.717.