Wednesday, November 26, 2014

Facebook Data Science's "What are we most thankful for?"

Recently, a Facebook craze asked users to list three things you are thankful for for five days. Data scientists Winter Mason, Funda Kivran-Swaine, Moira Burke, and Lada Adamic at Facebook have analyzed this data to better understand the patterns of gratitude publically shared by Facebook users.

The data analysts broke down data by most frequently listed gratitude topic:

Most frequently "liked" gratitude posts: (lots of support for our friends in recovery, which is nice to see).

Gender differences in is data for women. The wine gratitude finding for women was not present in the data for men. Ha.

Idiosyncratic data by state. I would say that Pennsylvania's fondness for country music rings true for me.

How to use in class: This example provides several interesting, easy to read graphs, and the graphs show how researchers can break down a single data set in a variety of interesting ways (by gender, by age, by state). Additionally, this data strikes me as a replication of Seligman's gratitude exercise from the positive psychology literature. You could use this example to discuss the ways in which students think this data (specifically, the things Facebook users say they are grateful for) would differ when the data is collected via Facebook's public forum versus when participants keep a private journal. How might a public display of gratitude differ from a private reflection upon gratitude?

Monday, November 24, 2014

Diane Fine Maron's "Tweets identify food poisoning outbreaks"

This Scientific American podcast by Diane Fine Maron describes how the Chicago Department of Public Health (CDPH) used Twitter data to shut down restaurants with health code violations. Essentially, the CDPH monitored Tweets in Chicago, searching for the words "food poisoning". When such a tweet was identified, an official at CDPH messaged the Twitterer in question with a link to an official complain form website.

The results of this program?

"During a 10-month stretch last year, staff members at the health agency responded to 270 tweets about “food poisoning.” Based on those tweets, 193 complaints were filed and 133 restaurants in the city were inspected. Twenty-one were closed down and another 33 were forced to fix health violations. That’s according to a study in the journal Morbidity and Mortality Weekly Report. [Jenine K. Harris et al, Health Department Use of Social Media to Identify Foodborne Illness — Chicago, Illinois, 2013–2014]"

I think this is a good example for using big data/new media data/archival data in a cheap and novel manner for the public good. I think it would also be interesting to ask your students to see how such a method could be employed within their community or campus. What are the big problems at your campus? How could you a) monitor Twitter/Facebook/YikYak/IG accounts and b) come up with a fast solution (like the online complaint forms) in order to follow up less formal data collection with more formal data collection?

(PS: HAPPY THANKSGIVING! Don't forget to properly reheat your leftovers!)

Monday, November 17, 2014

Free stats/methods textbooks via OpenStax

 OpenStax CNX "is a dynamic non-profit digital ecosystem serving millions of users per month in the delivery of educational content to improve learning outcomes." So, free text books that can be easily downloaded. Including nearly 7,000 free statistics text books as well as over 1,500 research methods texts.

How OpenStax works (via

I like this format because it is free but also because it is flexible enough that you can pick and choose chapters from different text books to use in a class. Additionally, if you are feeling generous, you can upload your own content to share.

Monday, November 10, 2014

Geoff Cumming's "The New Statistics: Estimation and Research Integrity"

Geoff Cumming
Geoff Cumming gave a talk at APS 2014 about the "new statistics" (reduced emphasis on p-value, greater emphasis on confidence intervals and effect sizes, for starters).

This workshop is now available, online and free, from APS. The three hour talk has been divided into five sections, and each sections comes with a "Table of Contents" to help you quickly navigate all of the information contained in the talk.

While some of this talk is too advanced for undergraduates, I think that there are portions, like his explanation of why p-values are so popular, p-hacking, confidence intervals can be nice additions to an Introduction to Statistics class.

Monday, November 3, 2014

John Venn's Google Doodle

Make pretty Venn diagrams via this archived version of the Google Doodle that celebrated John Venn's 180th birthday.

A good example of a Venn diagram as well as a way to (approximately) illustrate shared variance.
The overlap between vegetation and things that can fly