Skip to main content

Posts

Showing posts with the label fivethirtyeight

Aschwanden's "You Can’t Trust What You Read About Nutrition"

Fivethirtyeight provides lots of beautiful pictures of spurious correlations found by their own in-house study. At the heart of this article are the limitations of a major tool use in nutritional research, the Food Frequency Questionnaire (FFQ). The author does a mini-study, enlisting the help of several co-workers and fivethirtyeight.com readers. They track track their own food for a week and reflect on how difficult it is to properly estimate and recall food (perhaps a mini-experiment you could do with your own students?). And she shares the spurious correlations she found in her own mini-research: Aschwanden also discusses how much noise and lack of consensus their is in real, published nutritional research (a good argument for why we need replication!):  http://fivethirtyeight.com/features/you-cant-trust-what-you-read-about-nutrition/ How to use in class: -Short comings of survey research, especially survey research that relies on accurate memories -...

u/dat data's "Why medians > averages [OC] "

Unsettling. But I bet your students won't forget this example of why mean isn't always the best measure of central tendency. While the reddit user labeled this as example median's superiority, you could also use this as an example when mode is useful. As statisticians, we often fall back on to mode when we have categories and median when we have outliers, but sometimes either median or mode can be useful when decimal points don't make a lot of sense. Here is the image and commentary from reddit: And this an IG posting about the data from the same user, Mona Chalabi from fivethirtyeight. I included the Instagram because Chalabi expands a bit more upon the original data she used. https://www.instagram.com/p/BIVKJrcgW51/

Oster's "Everybody Calm Down About Breastfeeding"

I just had a baby. Arthur Francis joined our family last week. Don't mind the IV line on his head, he is a happy, chubby little boy. Now, I am the mother of a new born and a toddler. And I have certainly been inundated by the formula versus breast feeding debate. In case you've missed out on this, the debate centers around piles and piles of data that indicate that breast fed babies enjoy a wealth of developmental outcomes denied to their formula fed peers. Which means there is a lot of pressure to breast feed (and some women feel a lot of guilt when they can't/do not want to breast feed). However, the data that supports breast feeding also finds that breast feeding is much more common among  educated, wealthy white women with high IQs. And being born to such a woman probably affords a wealth of socioeconomic advantages beyond simply breast milk. These issues, as well as mixed research findings, are reviewed in Emily Oster's "Everybody calm down about brea...

Hickey's "The 20 Most Extreme Cases Of ‘The Book Was Better Than The Movie"

Data has been used to learn a bit more about the age old observation that books are always better than the movies they inspire. Fivethirtyeight writer Walk Hickey gets down to the brass tacks of this relationship by exploring linear relationships between book ratings and movie ratings.  The biggest discrepancies between movie and book ratings were for "meh" books made into beloved movies (see "Apocalypse Now"). How to use in class: -Hickey goes into detail about his methodology and use of archival data. The movie ratings came from Metacritic, the book ratings came for Goodreads. -He cites previous research that cautions against putting too much weight into Metacritic and Good reads. Have your students discuss the fact that Metacritic data is coming from professional movie reviewers and Goodreads ratings can be created by anyone. How might this effect ratings? -He transforms his data into z-scores. -The films that have the biggest movie:book rati...

Barry-Jester, Casselman, & Goldstein's "Should prison sentences be based on crimes that haven't been committed yet?"

This article describes how the Pennsylvania Department of Corrections is using risk assessment data in order to predict recidivism, with the hope of using such data in order to guide parole decisions in the future. So, using data to predict the future is very statsy, demonstrates multivariate modeling, and a good example for class, full stop. However, this article also contains a cool interactive tool, entitled "Who Should Get Parole?" that you could use in class. It demonstrates how increasing/decreasing alpha and beta changes the likelihood of committing Type I and Type II errors. The tool allows users to manipulate the amount of risk they are willing to accept when making parole decisions. As you change the working definition of a "low" or "high" risk prisoner, a visualization will startup, and it shows you whether your parolees stay out of prison or come back. From a statistical perspective, users can adjust the definition of a low, medium, and h...

Christie Aschwanden's "The Case Against Early Cancer Detection"

I love counterintuitive data that challenges commonly held beliefs. And there is a lot of counterintuitive health data out there (For example, data questioning the health benefits associated with taking vitamins  or data that lead to a revolution in how we put our babies to sleep AND cut incidents of SIDS in half ). This story by Aschwanden for fivethirtyeight.com discusses efficacy data for various kinds of cancer screening. Short version of this article: Early cancer screening detects non-cancerous lumps and abnormalities in the human body, which in turn leads to additional and evasive tests and procedures in order to ensure that an individual really is cancer-free or to remove growths that are not life-threatening (but expose an individual to all the risks associated with surgery). Specific Examples: 1) Diagnosis of thyroid cancer in South Korea has increased. Because it is being tested more often. However, death due to thyroid cancer has NOT increased (see figure below)...

Emily Oster's "Don't take your vitamins"

My favorite data is data that is both counter-intuitive and tests the efficacy of commonly held beliefs. Emily Oster's (writing for 538) presents such  data in her investigation of vitamin efficacy . The short version of this article: Data that associates vitamins with health gains are based on crap observational research. More recent and better research throws lots of shade on vitamin usage. Specific highlights that could make for good class discussion: -This article explains the flaws in observational research as well as an example of how to do good observational research well (via The Physician's Health Study , with large samples of demographically similar individuals as described in the portion of the article featuring the Vitamin E study). This point provides an example of why controlled, double-blind lab research is the king of all the research. -This is an accessible example as most of your students took their Flintstones. -The article also demonstrates The Thir...