Monday, May 25, 2015

Scott Janish's "Relationship of ABV to Beer Scores"

Scott Janish loves beer and statistics and blogging (a man after my own heart). His blog discusses home brewing as well as data related to beer. One of his statsy blog posts took look at the relationship between average alcohol by volume for a style of beer (below, on the x-axis) and the average rating (from, y-axis). He found, perhaps intuitively, that there is a positive correlation between the average Beer Style review for a type of beer and the average alcohol content for that type of beer. Scott was kind enough to provide us with his data set, turning this in to a most teachable moment.
How to use in class:
1) Scott provides his data. The r is .418, which isn't mightily impressive. However, I think you could teach your students a  about influential observations/outliers in regression/correlation by asking them to return to the original data, eliminate the 9 data points that are inconsistent with the larger pattern, and reanalyze the data to see the effect on r/p. Heck, just remove one or two inconsistent data points and let your students see what that does to the data.
2) Linear relationships. Correlations. Regressions. Generate an experiment to test the assumption that beer snobs just really like getting drunk (and, hence, this relationship).
3) For more beer figures, see here.
4) Take a look at that sample size (see at top of the figure). How does this make the data more reliable?

Why, no, I'm not above pandering to undergraduates.

Monday, May 18, 2015

Richard Harris' "Why Are More Baby Boys Born Than Girls?"

51% of the babies born in the US are male. Why? For a long time, people just assumed that the skew started at conception. Then Steven Orzack decided to test this assumption. He (and colleagues) collected sex data from abortions, miscarriages, live births (30 million records!), fertility clinics (140,00 embryos!), and different fetal screening tests (90,000 medical records!) to really get at the root of the sex skew/conception assumption. And the assumption didn't hold up: The sex ratio is pretty close to 50:50 at conception. Further analysis of the data found that female fetuses are more likely to be lost during pregnancy. Original research article here. Richard Harris' (reporting for NPR) radio story and interview with Orzack here.

Use this story in class as a discussion piece about long held (but never empirically supported) assumptions in the sciences and why we need to conduct research in order to test such assumptions. For example:

1) Discuss the weaknesses of previous attempts to answer the question of sex differences in birth rates.
2) Explain why samples matter/why sex selective abortion in two very large countries could skew this data and why it was important to use US/Canada data.
3) Discuss the fact that people kind of accepted previous explanation in lieu or proper research methods to answer this question.
4) I could see how you could use the basic premises in order to introduce the concepts behind chi square tests...Expected data: 50/50, Observed data: 51:49.
5) What further questions does this research raise (For example, male fetuses are especially vulnerable during the first trimester due to genetic abnormalities. But why do female fetuses become more vulnerable during the second trimester?).

Friday, May 15, 2015

National Geographic's "Are you typical?"

This animated short from National Geographic touches on averages, median, mode, sampling, and the need for cross-cultural research.

When defining the typical (modal) human, the video provides good examples of when to use mode (when determining which country has the largest population) and when to use median (median age in the world). It also illustrates the need to collect cross-cultural data before making any broad statements about typicality (when describing how "typical" is relative to a population).