Not awful and boring ideas for teaching statistics

Posts

John Bohannon's "I fooled millions into thinking chocolate helps weight loss. Here's how."

http://io9.com/i-fooled-millions-into-thinking-chocolate-helps-weight-1707251800 This story demonstrates how easy it is to do crap science, get it published in a pay-to-play journal, and market your research (to a global audience). Within this story, there are some good examples of Type I error, p -hacking, sensationalist science reporting, and, frankly, our obsession with weight and fitness and easy fixes—also, chocolate. Here is the original story, as told to io9.com by the perpetrator of this very conscientious fraud, John Bohannon . Bohannon ran this con to expose just how open to corruption and manipulation the whole research publication process can be ( BioMed Central scandal , for another example), especially when it just the kind of research that is bound to get a lot of media attention ( LaCour scandal , for another example). Bohannon set out to "demonstrate" that dark chocolate can contribute to weight loss. He ran an actual study ( n = 26). He went on a ...

TED talks about statistics and research methods

There are a number of TED talks that apply to research methods and statistics classes. First, there is this TED playlist entitled The Dark Side of Data . This one may not be applicable to a basic stats class but does address broader ethical issues of big data, widespread data collection, and data mining. These videos are also a good way of conveying how data collection (and, by extension, statistics) are a routine and invisible part of everyday life. This talk by Peter Donnelly discusses the use of statistics in court cases, and the importance of explaining statistics in a manner that laypeople can understand. I like this one as I teach my students how to create APA results sections for all of their statistical analyses. This video helps to explain WHY we need to learn to report statistics, not just perform statistics. Hans Rosling has a number of talks (and he has been mentioned previously on this blog, but bears being mentioned again). He is a physician and conveys his passion...

A request from the blogger

I am going up for both Rank and Tenure this fall. Within my applications for both, I will argue that this blog constitutes service to my profession. I have evidence of this: The blog has 50,000+ page views from 115 countries. I have 271 Twitter followers. So, I can successfully argue that someone other than my dad is reading the blog (Hi, Dad!). However, I think that more compelling evidence of service to my profession would come in the form of brief testimonials from its readers. If you have a few free moments, please consider writing a brief email that describes, maybe, your favorite blog post, why you enjoy this blog, how you think this blog contributes to the teaching of statistics and research methods, or mention a specific blog post or two that you've integrated into your own class. Do you students seem to enjoy any of the materials I've shared here? Have you recommended the blog to peers? You get the idea. Think you can help me out? If so, please shoot me an emai...

Randy McCarthy's "Research Minutia"

This blog posting by Dr. Randy McCarthy discusses best practices in organizing/naming conventions for data files. These suggestions are probably more applicable to teaching graduate students than undergraduates. They are also the sorts of tips and tricks we use in practice but rarely teach in the classroom (but maybe we should). Included in Randy's recommendations: 1) Maintain consistent naming conventions for frequently used variables (like scale items or compiled scales that you use over and over again in your research). Then create and run the same syntax for this data for the rest of your scholarly career. If you are very, very consistent in the scales you use and the data analyses your run, you can save yourself time by showing a little forethought. 2) Keep and guard a raw version of all data sets. 3) Annotate your syntax. I would change that to HEAVILY annotate your syntax. I even put the dates upon which I write code so I can follow my own logic if I have to let a d...

Chris Wilson's "Find out what your name would be if you were born today"

This little questionnaire will provide you with a) the ordinal value of your name for your sex/year of birth and then generate b) a bunch of other names from various decades that share your name's ordinal. Not the most complex example, but it does demonstrate ordinal data. Me and all the other 4th most popular names for women over the years. Additionally, this data is pulled from Social Security, which opens up the conversation for how we can use archival data for...not super important interactive thingies from Time Magazine? Also, you could pair up this example with other interactive ways of studying baby name data ( predicting a person's age if you know their name , illustrating different kinds of data distributions via baby name popularity trends ) in order to create a themed lesson that would correspond nicely to that first/second chapter of most undergraduate stats textbooks in which you learn about data distribution and different types of data.

Thomas B. Edsall's "How poor are the poor"?

How do we count the number of poor people in America? How do we operationalize "poor"? That is the psychometric topic of this opinion piece from the New York Times ( .pdf of same here ). This article outlines several ways of defining poor in America, including: 1)"Jencks’s methodology is simple. He starts with the official 2013 United States poverty rate of 14.5 percent. In 2013, the government determined that 45.3 million people in the United States were living in poverty, or 14.5 percent of the population.Jencks makes three subtractions from the official level to account for expanded food and housing benefits (3 percentage points); the refundable earned-income tax credit and child tax credit (3 points); and the use of the Personal Consumption Expenditures index instead of the Consumer Price Index to measure inflation (3.7 percentage points)." 2) " Other credible ways to define poverty paint a different picture. One is to count all those living ...

Scott Janish's "Relationship of ABV to Beer Scores"

Scott Janish loves beer, statistics, and blogging (a man after my own heart). His blog discusses home brewing as well as data related to beer. One of his statsy blog posts looked at the relationship between average alcohol by volume for a beer style (below, on the x-axis) and the average rating (from beeradvocate.com , y-axis). He found, perhaps intuitively, a positive correlation between the average Beer Style review for a type of beer and the moderate alcohol content for that type of beer. Scott was kind enough to provide us with his data set, turning this into a most teachable moment. http://scottjanish.com/relationship-of-abv-to-beer-scores/ How to use it in class: 1) Scott provides his data. The r is .418, which isn't mighty impressive. However, you could teach your students about influential observations/outliers in regression/correlation by asking them to return to the original data, eliminate the 9 data points inconsistent with the larger pattern, and reanalyze th...

Richard Harris' "Why Are More Baby Boys Born Than Girls?"

51% of the babies born in the US are male. Why? For a long time, people just assumed that the skew started at conception. Then Steven Orzack decided to test this assumption. He (and colleagues) collected sex data from abortions, miscarriages, live births (30 million records!), fertility clinics (140,00 embryos!), and different fetal screening tests (90,000 medical records!) to really get at the root of the sex skew/conception assumption. And the assumption didn't hold up: The sex ratio is pretty close to 50:50 at conception. Further analysis of the data found that female fetuses are more likely to be lost during pregnancy. Original research article here . Richard Harris' (reporting for NPR) radio story and interview with Orzack here . Use this story in class as a discussion piece about long held (but never empirically supported) assumptions in the sciences and why we need to conduct research in order to test such assumptions. For example: 1) Discuss the weaknesses of previo...

National Geographic's "Are you typical?"

This animated short from National Geographic touches on averages, median, mode, sampling, and the need for cross-cultural research. When defining the typical (modal) human, the video provides good examples of when to use mode (when determining which country has the largest population) and when to use median (median age in the world). It also illustrates the need to collect cross-cultural data before making any broad statements about typicality (when describing how "typical" is relative to a population).

Healey's "Study finds a disputed Shakespeare play bears the master's mark"

This story describes how psychologists used content analysis to provide evidence that Shakespeare indeed authored the play Double Falsehood. The play in question has been the subject of literary dispute for hundreds of years. It was originally published by Lewis Theobold in 1727. Theobold claimed it was based on unpublished works by Shakespeare. And literary scholars have been debating this claim ever since. Enter two psychology professors, Boyd and Pennebaker. They decided to tackle this debate via statistics. They conducted a content analysis Double Falsehood as well as confirmed work by Shakespeare. What they tested for: "Under the supervision of University of Texas psychology professors Ryan L. Boyd and James W. Pennebaker, machines churned through 54 plays -- 33 by Shakespeare, nine by Fletcher and 12 by Theobold -- and tirelessly computed each play's average sentence-length, quantified the complexity and psychological valence of its language, and sussed out the ...

The Onion's "Study finds those with deceased family members at high risk of dying themselves"

http://www.theonion.com/articles/study-finds-those-with-deceased-family-members-at,38463/ Man, I love the fact that when The Onion does an article about pretend research , they don't skimp on the details. This story includes the journal (NEJM), n -size (85,000), research design (longitudinal), a lead author, covariates (race, nationality, etc.) as well as a replication study. I like to think that the person who writes these articles paid really close attention in a statistics class or a research methods class and remembers just enough to be a smart ass about the research process. From the article: I used this just last week as a review example of correlation =/= causation. The students liked it. Mission accomplished.

Paul Basken's "When the Media Get Science Research Wrong, University PR May Be the Culprit"

Here is an article from the Chronicle of Higher Education ( .pdf in case you hit the pay wall) about what happens when university PR promotes research findings in a way that exaggerates or completely misrepresents the findings. Several examples of this are included (Smelling farts cures cancer? What?), including empirical study of how health related research is translated into press releases ( Sumner et al. , 2014). The Sumner et al. piece found, that among other things, that 40% of the press releases studied contained exaggerated advice based upon research findings. I think that this is an important topic to address as we teach our student not to simply perform statistical analyses, but to be savvy consumers of statistics. This may be a nice reading to couple with the traditional research methods assignment of asking students to find research stories in popular media and compare and contrast the news story with the actual research article. If you would like more di...

/rustid's "What type of Reese's has the most peanut butter?"

Rustid, a Reddit redditor, performed a research study in order to determine the proportions of peanut butter contained in different types of Reese's Peanut Butter candies. For your perusal, here is the original reddit thread (careful about sharing this with students, there is a lot of talk about how the scales Rustid used are popular with drug dealers), photo documentation via Imgur , and a Buzzfeed article about the experiment. Rustid documented the process by which he carefully extracted and measured the peanut butter content of nine different varieties of Reese's peanut butter and chocolate candies. See below for a illustration of how he extracted the peanut butter with an Exact-o knife and used electronic scales for measurements. http://imgur.com/a/wN6PH#SUhYBPx Below is a graph of the various proportions of peanut butter contained within each version of the Reese's Peanut Butter Cup. http://imgur.com/a/wN6PH#SUhYBPx This example...