Friday, May 29, 2015

Thomas B. Edsall's "How poor are the poor"?

How do we count the number of poor people in America? How do we operationalize "poor"? That is the psychometric topic of this opinion piece from the New York Times (.pdf of same here).

This article outlines several ways of defining poor in America, including:
1)"Jencks’s methodology is simple. He starts with the official 2013 United States poverty rate of 14.5 percent. In 2013, the government determined that 45.3 million people in the United States were living in poverty, or 14.5 percent of the population.Jencks makes three subtractions from the official level to account forexpanded food and housing benefits (3 percentage points); the refundable earned-income tax credit and child tax credit (3 points); and the use of the Personal Consumption Expenditures index instead of the Consumer Price Index to measure inflation (3.7 percentage points)."
2)  "Other credible ways to define poverty paint a different picture. One is to count all those living with less than half the median income as poor. "
3)"Timothy Smeeding, a professor of public affairs and economics at the University of Wisconsin-Madison, notes in an email that that “the official poverty line was about half of median income in 1963, but is less than 30 percent of median now because of general economic growth.”
4) Other debates included in the article is how to measure and apply inflation in order to understand how far you can stretch a dollar at different points in time.
This article also discusses the relative success of different governmental programs in combating poverty, and how these must be taken into account when selecting the best way to measure poverty.

In addition to providing a real life example of how one goes about operationalizing a variable, I think this article demonstrates how we can fib with statistics in a manner that doesn't require dirty data collection or outright lying: We make a logical argument to use a certain dependent variable (typically, one that supports our cause) and we roll with it. For example, an up and coming presidential candidate may be inclined to use a poverty rate that inflates the number while an incumbent president is aided by a more conservative estimate. 

Monday, May 25, 2015

Scott Janish's "Relationship of ABV to Beer Scores"

Scott Janish loves beer and statistics and blogging (a man after my own heart). His blog discusses home brewing as well as data related to beer. One of his statsy blog posts took look at the relationship between average alcohol by volume for a style of beer (below, on the x-axis) and the average rating (from, y-axis). He found, perhaps intuitively, that there is a positive correlation between the average Beer Style review for a type of beer and the average alcohol content for that type of beer. Scott was kind enough to provide us with his data set, turning this in to a most teachable moment.
How to use in class:
1) Scott provides his data. The r is .418, which isn't mightily impressive. However, I think you could teach your students a  about influential observations/outliers in regression/correlation by asking them to return to the original data, eliminate the 9 data points that are inconsistent with the larger pattern, and reanalyze the data to see the effect on r/p. Heck, just remove one or two inconsistent data points and let your students see what that does to the data.
2) Linear relationships. Correlations. Regressions. Generate an experiment to test the assumption that beer snobs just really like getting drunk (and, hence, this relationship).
3) For more beer figures, see here.
4) Take a look at that sample size (see at top of the figure). How does this make the data more reliable?

Why, no, I'm not above pandering to undergraduates.

Monday, May 18, 2015

Richard Harris' "Why Are More Baby Boys Born Than Girls?"

51% of the babies born in the US are male. Why? For a long time, people just assumed that the skew started at conception. Then Steven Orzack decided to test this assumption. He (and colleagues) collected sex data from abortions, miscarriages, live births (30 million records!), fertility clinics (140,00 embryos!), and different fetal screening tests (90,000 medical records!) to really get at the root of the sex skew/conception assumption. And the assumption didn't hold up: The sex ratio is pretty close to 50:50 at conception. Further analysis of the data found that female fetuses are more likely to be lost during pregnancy. Original research article here. Richard Harris' (reporting for NPR) radio story and interview with Orzack here.

Use this story in class as a discussion piece about long held (but never empirically supported) assumptions in the sciences and why we need to conduct research in order to test such assumptions. For example:

1) Discuss the weaknesses of previous attempts to answer the question of sex differences in birth rates.
2) Explain why samples matter/why sex selective abortion in two very large countries could skew this data and why it was important to use US/Canada data.
3) Discuss the fact that people kind of accepted previous explanation in lieu or proper research methods to answer this question.
4) I could see how you could use the basic premises in order to introduce the concepts behind chi square tests...Expected data: 50/50, Observed data: 51:49.
5) What further questions does this research raise (For example, male fetuses are especially vulnerable during the first trimester due to genetic abnormalities. But why do female fetuses become more vulnerable during the second trimester?).

Friday, May 15, 2015

National Geographic's "Are you typical?"

This animated short from National Geographic touches on averages, median, mode, sampling, and the need for cross-cultural research.

When defining the typical (modal) human, the video provides good examples of when to use mode (when determining which country has the largest population) and when to use median (median age in the world). It also illustrates the need to collect cross-cultural data before making any broad statements about typicality (when describing how "typical" is relative to a population).

Monday, May 11, 2015

Healey's "Study finds a disputed Shakespeare play bears the master's mark"

This story describes how psychologists used content analysis to provide evidence that Shakespeare indeed authored the play Double Falsehood.

The play in question has been the subject of literary dispute for hundreds of years. It was originally published by Lewis Theobold in 1727. Theobold claimed it was based on unpublished works by Shakespeare. And literary scholars have been debating this claim ever since.

Enter two psychology professors, Boyd and Pennebaker. They decided to tackle this debate via statistics. They conducted a content analysis Double Falsehood as well as confirmed work by Shakespeare. What they tested for:

"Under the supervision of University of Texas psychology professors Ryan L. Boyd and James W. Pennebaker, machines churned through 54 plays -- 33 by Shakespeare, nine by Fletcher and 12 by Theobold -- and tirelessly computed each play's average sentence-length, quantified the complexity and psychological valence of its language, and sussed out the frequent use of unusual words."
Boyd and Pennebaker's work ended up in Psychological Science (as their research used content analysis to seek "psychological signatures" in the text). I think this is an interesting example of  a) content analysis, b) statistics and literature coming together, and c) using the rigors of the scientific method to inform a debate and help solve a mystery that exist outside of the realm of science.

Monday, May 4, 2015

The Onion's "Study finds those with deceased family members at high risk of dying themselves",38463/

Man, I love the fact that when The Onion does an article about pretend research, they don't skimp on the details. This story includes the journal (NEJM), n-size (85,000), research design (longitudinal), a lead author, covariates (race, nationality, etc.) as well as a replication study. I like to think that the person who writes these articles paid really close attention in a statistics class or a research methods class and remembers just enough to be a smart ass about the research process. From the article:

I used this just last week as a review example of correlation =/= causation. The students liked it. Mission accomplished.