Monday, June 29, 2015

Scott Ketter's "Methods can matter: Where web surveys produce different results than phone interviews"

Pew recently revisited the question of how survey modality can influence survey responses. In particular, this survey used both web and telephone based surveys to ask participants about their attitudes towards politicians, perceptions of discrimination, and their satisfaction with life.

As summarized in the article, the big differences are:

"1) People expressed more negative views of politicians in Web surveys than in phone surveys." 

"2) People who took phone surveys were more likely than those who took Web surveys to say that certain groups of people – such as gays and lesbians, Hispanics, and blacks – faced “a lot” of discrimination." 

"3) People were more likely to say they are happy with their family and social life when asked by a person over the phone than when answering questions on the Web."  

The social psychologist in me likes this as an example of the Social Desirability Bias. When speaking directly to another human being, we report greater life satisfaction, we are more critical of politicians, and more sympathetic towards members of minority groups.

The statistician in me thinks this is a good example for discussing sources of error in research. Even a completely conscientious research using valid, reliable measures may have their data effected based on how it is collected. It might be interesting to asks students to generate lists of research topics (say, market research about cereal preference versus opinions about abortion) and whether students think you could get "true" answers via telephone or web surveys. What is a "true" answer, how could we evaluate or measure this? How could we come up with an implicit or behavioral measure of something like satisfaction with family life, then test which survey modality is most congruent with an implicit or behavioral measure? What do students think would happen if you used face-to-face interviews or paper and pencil surveys in a classroom of people completing surveys?

Additionally, you can't call yourself a proper stats geek unless you follow Pew Research Center on either Twitter (@pewresearch) or on Facebook . So many good examples of interesting data!

Wednesday, June 24, 2015

Statsy pictures/memes for not awful PowerPoints

I take credit for none of these. A few have been posted here before.

by Rayomond Biesinger,

Creator unknown, usually attributed to clipart?

Psychometrics: Interval scale with proper anchors




"Symbols that math urgently needs to adopt"


Javier SaldeƱa,

By Pedro Velica, @pedrovelica,




From Brooklyn 99, but my screen shots
From The Lego Batman Movie

Monday, June 22, 2015

John Bohannon's "I fooled millions into thinking chocolate helps weight loss. Here's how."
This story demonstrates how easy it is to do crap science, get it published in a pay-to-play journal, and market your research (to a global audience). Within this story, there are some good examples of Type I error, p-hacking, sensationalist science reporting, and, frankly, our obsession with weight and fitness and easy fixes. Also, chocolate.

Here is the original story, as told to by the perpetrator of this very conscientious fraud, John Bohannon. Bohannon ran this con as to expose just how open to corruption and manipulation the whole research publication process can be (BioMed Central scandal, for another example), especially when it just the kind of research that is bound to get a lot of media attention (LaCour scandal, for another example).

Bohannon set out to "demonstrate" that dark chocolate can contribute to weight loss. He ran an actual study (n = 26). He went on a fishing expedition and measured 18 different markers of health, and did find a significant relationship between chocolate and lower cholesterol (a good example of likely Type I error).

So a manuscript was created. Bohannon describes how quickly their manuscript was accepted at several pay to play journals (no peer review, either, opening up a class discussion about both peer review as well as the gradient of pay to play journals, some of which are peer reviewed, many of which are not).

Dr. Bonahon then describes how he created a website called "The Institute of Diet and Health" to legitimize his research (to be clear, this Institute does not exist) as well as a press release for his study. Then, the media did his work for him. Once one outlet picked up his story, so did hundreds of others. One glimmer of hope: While the media just ran with this story, Bohannon states that internet discussion boards associated with the different media outlets actually yielded intelligent discussions that picked apart the flaws of the study.

So, I think that the whole piece would be a good reading assignment for a statistics or research methods class. Additionally, if you are looking to use this story in class, here is an NPR interview with Bohannon.

Monday, June 15, 2015

TED talks about statistics and research methods

I love TED talks. They are licensed to be used for teaching. They come with closed captioning and transcripts. They bring experts into your classroom.

There are a number of TED talks that apply to research methods and statistics classes.

First, there is a whole TED playlist entitled The Dark Side of Data. This one may not be applicable to a basic stats class but does address broader ethical issues of big data, widespread data collection, and data mining. These videos are also a good way of conveying how data collection (and, by extension, statistics) are a routine and invisible part of everyday life.

This talk by Peter Donnelly discusses the use of statistics in court cases, and the importance of explaining statistics in a manner that laypeople can understand. I like this one as I teach my students how to create APA results sections for all of their statistical analyses. This video helps to explain WHY we need to learn to report statistics, not just perform statistics.

Hans Rosling has a number of talks (and he has been mentioned previously on this blog, but bears being mentioned again). He is a physician and conveys his passion for data-driven decisions regarding global public health issues via several talks. In addition to his talks, you and your students can play with his public health statistical visualization software at

This short video describes gold-standard research for the pharmaceutical industry but the research lessons apply across disciplines. It is also a nice way to introduce the reasoning behind independent and paired t-tests (the video emphasizes control vs. experimental research).

This video describes equal likelihood outcome probability and instances when outcomes are counter-intuitive. It includes probability calculations and touches on the fact that outcome frequencies that are dictated by mathematics only reveal themselves over MULTIPLE iterations of the event (for instance .5 probability of getting tails when you flip a coin).

For other TED and non-TED statistics videos, check out the video label for this blog.

UPDATE: 10/16/15

This talk by Dr. Alyson McGregor discusses how prescription drug trials typically use only male samples (due to fluctuations in female hormones over the a month). However, 80% of drugs that are recalled are recalled due to negative side effects in women. I think this is a good example for research methods, efficacy research, and sampling error.

UPDATE: 1/20/16

The TED-Ed talk illustrates Simpson's Paradox. 

UPDATE: 2/3/16

Data-driven decisions! This video begins soft with examples of data-driven decisions made by Netflix in order to figure out what shows to develop, but gets real with examples of using data to make parole decisions and the Google Flu debacle.

UPDATE: 2/24/17

New playlist from TED, entitled "Statistically speaking..."

Monday, June 8, 2015

A request from the blogger

I am going up for both Rank and Tenure this fall.

Within my applications for both, I will argue that this blog constitutes service to my profession.

I have evidence of this: The blog has 50,000+ page views from 115 countries. I have 271 Twitter followers. So, I can successfully argue that someone other than my dad is reading the blog (Hi, Dad!).

However, I think that more compelling evidence of service to my profession would come in the form of brief testimonials from its readers. If you have a few free moments, please consider writing a brief email that describes, maybe, your favorite blog post, why you enjoy this blog, how you think this blog contributes to the teaching of statistics and research methods, or mention a specific blog post or two that you've integrated into your own class. Do you students seem to enjoy any of the materials I've shared here? Have you recommended the blog to peers? You get the idea.

Think you can help me out? If so, please shoot me an email to You will be rewarded in...Rank and Tenure karma? My gratitude? Definitely my gratitude.

Thanks for reading,


Randy McCarthy's "Research Minutia"

This blog posting by Dr. Randy McCarthy discusses best practices in organizing/naming conventions for data files. These suggestions are probably more applicable to teaching graduate students than undergraduates. They are also the sorts of tips and tricks we use in practice but rarely teach in the classroom (but maybe we should).

Included in Randy's recommendations:

1) Maintain consistent naming conventions for frequently used variables (like scale items or compiled scales that you use over and over again in your research). Then create and run the same syntax for this data for the rest of your scholarly career. If you are very, very consistent in the scales you use and the data analyses your run, you can save yourself time by showing a little forethought.

2) Keep and guard a raw version of all data sets.

3) Annotate your syntax. I would change that to HEAVILY annotate your syntax. I even put the dates upon which I write code so I can follow my own logic if I have to let a data set go for a few weeks/months.

My only additions to the list would be a trick I learned in graduate school: Every time I compile a scale, I name it @scalename. The @ sign sticks out among variable names and reminds me that this is a compiled scale (and potentially flawed). And every time I compile a scale in SPSS, I definitely do so using syntax and I save my work (just to ensure that I have a record of what I did).

Also, have a mindful back up system for data sets. I use and love Dropbox. I also like Google drive but I am use Dropbox they way most people use My Computer on their hard drives, so it is easier for me to use.

Doe anyone else have any similar tips? Feel free to comment or email me ( if you have any ideas and I'll include them in this post.

Monday, June 1, 2015

Chris Wilson's "Find out what your name would be if you were born today"

This little questionnaire will provide you with a) the ordinal value of your name for your sex/year of birth and then generate b) a bunch of other names from various decades that share your name's ordinal. Not the most complex example, but it does demonstrate ordinal data.

Me and all the other 4th most popular names for women over the years.
Additionally, this data is pulled from Social Security, which opens up the conversation for how we can use archival data for...not super important interactive thingies from Time Magazine? Also, you could pair up this example with other interactive ways of studying baby name data (predicting a person's age if you know their name, illustrating different kinds of data distributions via baby name popularity trends) in order to create a themed lesson that would correspond nicely to that first/second chapter of most undergraduate stats textbooks in which you learn about data distribution and different types of data.