Monday, April 27, 2015

Paul Basken's "When the Media Get Science Research Wrong, University PR May Be the Culprit"

Here is an article from the Chronicle of Higher Education (.pdf  in case you hit the pay wall) about what happens when university PR promotes research findings in a way that exaggerates or completely misrepresents the findings. Several examples of this are included (Smelling farts cures cancer? What?), including empirical study of how health related research is translated into press releases (Sumner et al., 2014). The Sumner et al. piece found, that among other things, that 40% of the press releases studied contained exaggerated advice based upon research findings.

I think that this is an important topic to address as we teach our student not to simply perform statistical analyses, but to be savvy consumers of statistics. This may be a nice reading to couple with the traditional research methods assignment of asking students to find research stories in popular media and compare and contrast the news story with the actual research article.

If you would like more discussion prompts like this one, here is an additional example of PR and researchers not working well together, see this old post that describes what happened when PR overstates research findings (here, the safety of artificial sweeteners), leading to researchers having to publicly correct their own PR departments. Another gem is this blog post, that describes how media (not a PR department) got there hands on data related to personal fitness trackers and totally misrepresented the findings. 

Monday, April 20, 2015

/rustid's "What type of Reese's has the most peanut butter?"

Rustid, a Reddit redditor, performed a research study in order to determine the proportions of peanut butter contained in different types of Reese's Peanut Butter candies. For your perusal, here is the original reddit thread (careful about sharing this with students, there is a lot of talk about how the scales Rustid used are popular with drug dealers), photo documentation via Imgur, and a Buzzfeed article about the experiment.

Rustid documented the process by which he carefully extracted and measured the peanut butter content of nine different varieties of Reese's peanut butter and chocolate candies. See below for a illustration of how he extracted the peanut butter with an Exact-o knife and used electronic scales for measurements.

Below is a graph of the various proportions of peanut butter contained within each version of the Reese's Peanut Butter Cup.

This example illustrates methodology and bar graphs.

If you are feeling especially generous, you could let your students replicate this project in class. If you really want amazing teaching evaluations, you could tell you students that you require n = 30 for each kind of candy and your students can eat the left overs.

Other points of discussion: Why did the scientist use proportions of peanut butter, not amounts of peanut butter, in his graph? Did he use proportions for the same reason that we create relative frequency graphs? What are other qualities of a Reese's Peanut Butter Cup that we could evaluate in a quantitative manner? Why isn't his graph APA style compliant? How could we conduct an experiment in order to determine which ratio is the ideal ratio? If you look through the pictures, Rustid only measured the peanut butter content of one half of each symmetrical candy. What problems may arise from this? What data should he have provided to justify this methodological decision?

Thursday, April 16, 2015

Reddit for Statistics Class

I love reddit. I really love the sub-reddit r/dataisbeautiful. Various redditors contribute interesting graphs and charts from all over the interwebz.

I leave you to figure out how to use these data visualizations in class. If nothing else, they are highly interesting examples of a wide variety of different graphing techniques applicable to different sorts of data sets. In addition to interesting data visualizations, there are usually good discussions (yes, good discussion in the internet!) among redditors about what is pushing the presented findings.

Another facet of these posts are the sources of the data. There are many examples using archival data, like this chart that used social media to estimate sports franchise popularity,

Users also share interesting data from more traditional sources, like APA data on the rates of Masters/Doctorates awarded over time and user rating data generated by IMDB (here, look at the gender/age bias in ratings of the movie Fifty Shades of Grey).

Other statsy subreddits:

r/samplesize: Here, you can post your own online research projects in order to (hopefully!) up you sample size. You can also work on your own research participation karma by participating in research.

r/HITsWorthTurkingFor: Redditors share links for Amazon's mTurk tasks that are particularly profitable. There is all manner of work available here (not all of it research based), but as mTurk is increasingly popular with academic researchers, I'm adding it to this list. I think this may be a valuable discussion piece if you talk about mTurk in your research methods classes and want to discuss the variables that influence participation (and how this may effect your research) as well as whether or not mTurk is really a random sampling of humanity.

/r/statistics: A place for all of your statistics questions.

/r/dataisugly: The opposite of r/dataisbeautiful. Can be used to teach students how to create good graphs by showing them how not to create good graphs.

Monday, April 13, 2015

Applied statistics: Introduction to Statistics at the ballpark

This semester (SP 15), I taught an Honors section of Psychological Statistics for the first time. In this class, I decided to take my students to a minor league baseball game (The Erie Seawolves, the Detroit Tiger's AA affiliate) in order to teach my students a bit about 1) applied statistics and data collection as well as 2) selecting the proper operationalized variable when answering a research question.

Students prepared for the game day activity via a homework assignment they completed prior to the game.

For this assignment, students learned about a few basic baseball statistics (batting average (AVG), slugging (SLG), and on-base plus slugging (OPS)). They looked up these statistics for a random Seawolves' player (based on 2014 data) and learned out to interpret these data points.

They also read an opinion piece on why batting averages are not the most informative piece of data when trying to determine the merit of a given player. The opinion piece tied this exercise into that larger statistical question of how we figure out which data points best reflect on an underlying phenomena. Here, the question is, "How do we use data identify a good player?".

On the day of the game, I gave them a worksheet so that they could pick one player and collect their data (SLG, OPS, AVE). Not the most rigorous of activities, but enough to get them to follow the game and not so much as they would feel overwhelmed (most of the students knew little about baseball scoring going into the exercise).

So, there is a description of my first attempt at a statistics field trip. There is room for growth and change if I try this out in future semesters. If anyone has any additional ideas or content for a simple exercise suitable for college students who may know very little about baseball, feel free to share in the comments section or shoot me an email (


1) My university (Gannon University) is a short walk from the Erie Seawolves' park. So, transportation wasn't an issue. You could always eliminate the need for transportation by using this activity with your university's baseball/softball team (or pick another sport altogether).

2) I took my Honors section, so only 11 students. Additionally, we were able to get money from my university's Honors program to pay for the tickets. So, money wasn't an issue for me or my students.

3) Minor league = much cheaper tickets than a major league game. Our tickets included free baseball caps, ridiculous entertainment between innings, harassment at the hands of the team mascot, our class name on the Jumbotron, AND a post-game autograph session, which several of my students attended. It was a fun, goofy day and a nice way to spend time with my students outside of the classroom after a long semester and very cold winter.

4) I don't know a lot about the nitty gritty of baseball. Neither did most of my students. My wonderful husband knows A LOT about baseball (especially the Mets) and came along to help students follow their player and track their progress. He was also hand to have around for some odd situations that came up (for instance, if a player is at bat and then their teammate gets the third out for the inning while stealing a base, what do you put down for their "at bat"?). So, having a resident baseball expert along for the fun is a good idea.

5) We were in a section that was getting a lot of foul balls and I swear, I had a heart attack every time one went our way out of fear a student getting hurt on a field trip.

6) My students are mostly Sophomores, so booze wasn't anything I had to address (the food vendors always check IDs) but could be something to consider if you took Juniors or Seniors to the ballpark.

Monday, April 6, 2015

Using data to inform debate: Free-range parenting

One way to engage students in the classroom is by bringing in debates and real world examples. Sometimes, such debates take place largely over social media. A Facebook question du jour: Is "free-range" (letting your kids go out side, walk to the store, etc. without supervision) a good way to build independence or child neglect? Anecdotes abound, but how safe is your kid when they are out on their own? What kind of data could help us answer this question objectively?

The first piece of information comes from an opinion piece by Clemens Wergin from the New York Times (.pdf in case of pay wall). Wergin describes how free range parenting is the norm in Germany and contrasts American attitudes to German attitudes, providing a quick example of multicultralism (and why we should never assume that the American attitude towards something is the only opinion). He then  provides data that explains that children are far more likely to be killed when in a car than when walking by themselves. Additionally, he cites data demonstrating how rare stranger kidnapping really is.

Related to this is an older piece from Meagen Voss for NPR. This one is a great example of the availability heuristic, as well as a how-to guide for using data to support a logical argument. This news story contrasts the fears that parents have about their children's safety versus health and safety data that details what actually hurts and kills children. See the list below:

How to use in class:

1) Good examples of the proper use of data to make an argument/counter argument/shed light on a debate that can be emotional.

2) Spark a discussion about how data can be used to provide evidence and insight in our legal process (here, whether or not free range parenting is the same thing as neglect).

3) Availability heuristic

4) Multiculturalism/parenting/developmental psychology

5) Ask your students to generate a list of other data points that might be useful in this debate. Is kidnapping the only fear parents have when thinking about their free range children? Do you think that different neighborhoods might have different dangers (bullying, drug dealing, really bad drivers, severe weather) parents need to consider? How about the parents of children with developmental or cognitive problems? How does child maturity fit into the mix? Do you think that the data presented is actually sufficient evidence (for instance, if children spend more time in cars than walking, is it just pure chance that is leading to greater deaths via car?).