Thursday, April 16, 2015

Reddit for Statistics Class

I love reddit. I really love the sub-reddit r/dataisbeautiful. Various redditors contribute interesting graphs and charts from all over the interwebz.



I leave you to figure out how to use these data visualizations in class. If nothing else, they are highly interesting examples of a wide variety of different graphing techniques applicable to different sorts of data sets. In addition to interesting data visualizations, there are usually good discussions (yes, good discussion in the internet!) among redditors about what is pushing the presented findings.

Another facet of these posts are the sources of the data. There is a lot of archival data being used, like this example that uses social media to estimate sports franchise popularity,

Users also share interesting data from more traditional sources, like APA data on the rates of Masters/Doctorates over time and user rating data generated by IMDB (here, look at the gender/age bias in ratings of the movie Fifty Shades of Grey).


Other statsy subreddits:

r/samplesize: Here, you can post your own online research projects in order to (hopefully!) up you sample size. You can also work on your own research participation karma by participating in research.

r/HITsWorthTurkingFor: Redditors share links for Amazon's mTurk projects that are particularly profitable. There is all manner of work available here, but mTurk is increasingly popular with academic researchers, so I'm adding it to this list. I think this may be a valuable discussion pieces if you talk about mTurk in your research methods classes and want to discuss the variables that influence participation (and how this may effect your research) as well as whether or not mTurk is really a random sampling of humanity.

/r/statistics: A place for all of your statistics questions, from queries from undergraduates to folks with advanced degrees and big questions about statistical analysis.

/r/dataisugly: The opposite of r/dataisbeautiful. Can be used to teach students how to create good graphs by showing them how not to create good graphs.

Monday, April 13, 2015

Applied statistics: Introduction to Statistics at the ballpark

This semester (SP 15), I taught an Honors section of Psychological Statistics for the first time. In this class, I decided to take my students to a minor league baseball game (The Erie Seawolves, the Detroit Tiger's AA affiliate) in order to teach my students a bit about 1) applied statistics and data collection as well as 2) selecting the proper operationalized variable when answering a research question.


Students prepared for the game day activity via a homework assignment they completed prior to the game.

For this assignment, students learned about a few basic baseball statistics (batting average (AVG), slugging (SLG), and on-base plus slugging (OPS)). They looked up these statistics for a random Seawolves' player (based on 2014 data) and learned out to interpret these data points.

They also read an opinion piece on why batting averages are not the most informative piece of data when trying to determine the merit of a given player. The opinion piece tied this exercise into that larger statistical question of how we figure out which data points best reflect on an underlying phenomena. Here, the question is, "How do we use data identify a good player?".

On the day of the game, I gave them a worksheet so that they could pick one player and collect their data (SLG, OPS, AVE). Not the most rigorous of activities, but enough to get them to follow the game and not so much as they would feel overwhelmed (most of the students knew little about baseball scoring going into the exercise).

So, there is a description of my first attempt at a statistics field trip. There is room for growth and change if I try this out in future semesters. If anyone has any additional ideas or content for a simple exercise suitable for college students who may know very little about baseball, feel free to share in the comments section or shoot me an email (hartnett004@gannon.edu).

Errata:

1) My university (Gannon University) is a short walk from the Erie Seawolves' park. So, transportation wasn't an issue. You could always eliminate the need for transportation by using this activity with your university's baseball/softball team (or pick another sport altogether).

2) I took my Honors section, so only 11 students. Additionally, we were able to get money from my university's Honors program to pay for the tickets. So, money wasn't an issue for me or my students.

3) Minor league = much cheaper tickets than a major league game. Our tickets included free baseball caps, ridiculous entertainment between innings, harassment at the hands of the team mascot, our class name on the Jumbotron, AND a post-game autograph session, which several of my students attended. It was a fun, goofy day and a nice way to spend time with my students outside of the classroom after a long semester and very cold winter.

4) I don't know a lot about the nitty gritty of baseball. Neither did most of my students. My wonderful husband knows A LOT about baseball (especially the Mets) and came along to help students follow their player and track their progress. He was also hand to have around for some odd situations that came up (for instance, if a player is at bat and then their teammate gets the third out for the inning while stealing a base, what do you put down for their "at bat"?). So, having a resident baseball expert along for the fun is a good idea.

5) We were in a section that was getting a lot of foul balls and I swear, I had a heart attack every time one went our way out of fear a student getting hurt on a field trip.

6) My students are mostly Sophomores, so booze wasn't anything I had to address (the food vendors always check IDs) but could be something to consider if you took Juniors or Seniors to the ballpark.

Monday, April 6, 2015

Using data to inform debate: Free-range parenting

One way to engage students in the classroom is by bringing in debates and real world examples. Sometimes, such debates take place largely over social media. One Facebook question du jour: Is "free-range" (letting your kids go out side, walk to the store, etc. without supervision) a good way to build independence or child neglect? Anecdotes abound, but how safe is your kid when they are out on their own? What kind of data could help us answer this question?

http://www.nytimes.com/2015/03/20/opinion/the-case-for-free-
range-parenting.html

The first is an opinion piece by Clemens Wergin from the New York Times (.pdf in case of pay wall). Wergin describes how free range parenting is the norm in Germany and contrast American attitudes to German attitudes, providing a quick example of multicultralism (and why we should never assume that the American attitude towards something is the only opinion). He then  provides data that explains that children are far more likely to be killed when in a car than when walking by themselves. Additionally, they bring up data demonstrating how rare stranger kidnapping really is. .

Related to this is an older piece from Meagen Voss for NPR. This one is a great example of the availability heuristic, as well as a how-to guide for using data to support a logical argument. This news story contrasts the fears that parents have about their children's safety versus health and safety data that details what actually hurts and kills children. See the list below:


http://www.npr.org/blogs/health/2010/08/30/129531631/
5-worries-parents-should-drop-and-5-they-should


I think that these examples are good for:

1) Proper use of data to make an argument/counter argument/shed light on a debate that can be quite emotional for some people.

2) Spark a discussion about how data can be used to provide evidence and insight in our legal process (here, whether or not free range parenting is the same thing as neglect).

3) Availability heuristic

4) Multiculturalism/parenting/developmental psychology

5) Ask your students to generate a list of other data points that might be useful in this debate. Is kidnapping the only fear parents have when thinking about their roaming children? Do you think that different neighborhoods might have different dangers (bullying, drug dealing, really bad drivers) parents need to consider? How about the parents of children with developmental or cognitive problems? How does child maturity fit into the mix? Do you think that the data presented is actually sufficient evidence (for instance, if children spend more time in cars than walking, is it just pure chance that is leading to greater deaths via car?).