Monday, May 22, 2017

Trendacosta's Mathematician Boldly Claims That Redshirts Don't Actually Die the Most on Star Trek

io9 recaps a talk given by mathematician James Grime. He addressed the long running Star Trek joke that the first people to die are the Red Shirts. Using resources that detail the ins and outs of Star Trek, he determined that:

This makes for a good example of absolute vs. relative risk. Sure, more red shirts may die, absolutely, but proportionally? They only make up 10% of the deaths. Also, I think this is a funny example of using archival data in order to understand an actual on-going Star Trek joke.

For more math/Star Trek links, go to's treatment of the speech.

Monday, May 15, 2017

Pew Research Center's Methods 101 Video Series

Pew Research Center is an excellent source for data to use in statistics and research methods classes. I have blogged about them before (look under the Label pew-pew!) and I'm excited to share that Pew is starting up a series of videos dedicated to research methods. The new series will be called Methods 101.

The first describes sampling techniques in which weighing is used to adjust imperfect samples as to better mimic the underlying population. I like that this is a short video that focuses on one specific aspect of polling. I hope that they continue this trend of creating very specific videos covering specific topics.

Looking for more videos? Check out Pew's YouTube Channel. Also, I have a video tag for this blog.

Monday, May 8, 2017

Daniel's "Most timeless songs of all time"

This article, written by Matt Daniels for The Pudding, allows you to play around with a whole bunch of Spotify user data in order to generate visualizations of song popularity over time. You can generate custom visualizations using the very interactive sections on this website. For instance, there is a special visualization that allows you to finally quantify the Biggie/Tupac Rivalry.

So, data and pop culture are my two favorite things. I could play with these different interactive pieces all day long. But there are also some specific ways you could use this in class.

1) Generate unique descriptive data for different musicians and then ask you students to create visualizations using the software of your choosing. Below, I've queried Dixie Chicks play data. Students could enter their own favorite artist. Note: They data only runs through 2005.

2) Sampling errors: Here is a description of the methodology used for this data:

Is this representative of all data? What does he mean by "normalize the data" as a way to correct the data? Where could we collect data as to have a more representative sampling? Would Sirus skew older? What about iTunes?

3) Using data mining/archival data to generate insights into research questions.

Here, the question explored in this article is, "What is the difference between a flash in the pan song versus a song for the ages?".

Here, data from 2013 hits has been tracked. And it founds that the post-hit plateau is a good indicator of music that will have longer staying power. Here, event though Daft Punk's Get Lucky peaked much higher than Onerepublic's Counting Stars, Counting Starts has a higher plateau. Also, note that with this interactive piece, students could select any number of songs to compare.

Monday, May 1, 2017

"Student life summarized using graphs" video

I found this video at the Student Problems Page on Facebook. I don't know who to attribute it to, but it was probably a smart, sarcastic Intro Stats student.

Monday, April 24, 2017

NYT's "You Draw It" series

As I've discussed in this space before, I think that it is just as important to show our students how to use statistics in real life as it is to show our students how to conduct an ANOVA.

The "You Draw It" series from the New York Times provides an interactive, personalized example of using data to prove a point and challenge assumptions. Essentially, this series asks you to predict data trends for various social issues. Then it shows you how the data actually looks. So far, there are three of these features: 1) one that challenges assumptions about Obama's performance as president, 2) one that illustrates the impact of SES on college attendance, and 3) one that illustrates just how bad the opiod crisis has become in our country.

Obama Legacy Data

This "You Draw It" asks you to predict Obama's performance on a number of measures of success. Below, the dotted yellow line represents my estimate of national debt under Obama. The blue line shows true national debt under Obama. Note: With this tool, you trace your trend line on the graph, press a button, and then the actual data pops up, as well as discussion about the actual data.

We can use this data to see how political affiliation influences assumptions about the Obama presidency. This one can be used both ways: Right-leaning users may assume the worse while left-leaning users assume the best.

How Family Income Affects Children's College Chances

This example uses data to touch on a social justice issue: Whether or not a college education is really accessible to everyone. After you enter your estimate and see the real data, the website returns normative data about performance on the task and how you compare to other users. Below, the dotted line represents the actual data, and my guess was the solid line.

I think this would be useful in a class on poverty and as an example of a linear relationship.

Drug Overdose Epidemic

This example would be good for a clinical psychology, addiction, criminal justice, or public health class. It asks the user to guess number of deaths due to car accident deaths, gun deaths, and HIV deaths in the US. Finally, it asks you to estimate deaths due to drug overdoses. Which have sky rocketed in the last 20 years (see below).

Then it contrasts drug overdose deaths with car accidents, guns, and HIV. This example may also be useful for social psychology, as it hints at the availability heuristic.

How to use in class:
1) Non-statisticians using statistics to tell a story.
2) Using clever visualization to tell a story.
3) The interactive piece here really forces you to connect to the data and be proven right or wrong.

Monday, April 17, 2017

Sense about Science USA: Statistics training for journalists

In my Honors Statistics class, we have days devoted to discussing thorny issues surround statistics. One of these days is dedicated to the disconnect between science and science reporting in popular media.

I have blogged about this issue before and use many of these blog posts to guide this discussion: This video by John Oliver is hilarious and touches on p-hacking in addition to more obvious problems in science reporting, this story from NPR demonstrates what happens when a university's PR department does a poor job of interpreting research results. The Chronicle covered this issue, using the example of mis-shared research claiming that smelling farts can cure cancer (a student favorite), and this piece describes a hoax that one "researcher" pulled in order to demonstrate how quickly the media will pick up and disseminate bad-but-pleasing research to the masses.

When my students and I discuss this, we usually try to brain storm about ways to fix this problem. Proposed solutions: Public shaming of bad journalists, better editing of news stories before they are published, a prestigious award system for accurate science writing. And another idea my students usually arrive upon? Better training for journalists.

So, you can imagine how pleased I was to discover that such classes already exist via Sense about Science USA.

Their mission:

They support this mission in a few different ways. They advocate for registering all medical trials conducted on humans. They are training scientists to more effectively communicate their findings to the public. And, apropos of this blog, they are also training journalists to better understand statistics AND offer one-on-one consulting to journalists trying to understand data.

Here is their description of why it is important to better train journalists.

How to use in class:

1) Instead of just showing students the problems associated with poor science writing, let's show them a possible solution as well.
2) Statistics isn't just for statisticians, statistics are for anyone who wants to better understand policy issues, emerging research, and evidence-based practices in their field.
3) Show your students some examples of poor science writing. Have them develop a brief presentation that would address the most common statistical mistakes made by science writers.

Monday, April 10, 2017

Reddit's data_irl subreddit

You guys, there is a new subreddit just for sharing silly stats memes. It is called r/data_irl/.

The origin story is pretty amusing.

I have blogged about the subreddit r/dataisbeautiful previously. The point of this sub is to share useful and interesting data visualizations. The sub has a hard and fast rule about only posting original content or well-cited, serious content. It is a great sub.

But it leaves something to be desired. That something is my deep desire to see stats jokes and memes.

On April Fool's Day this year, they got rid of their strict posting rules for a day and the dataisbeautiful crowd provided lots of hilarious stats jokes, like these two I posted on Twitter:

The response was so strong, because there are so many of people that love stats memes, that a new sub was started, data_irl JUST TO SHARE SILL STATS GRAPHICS. It feels like coming home to my people. 

Monday, April 3, 2017

Day's Edge Production's "The Snow Guardian"

A pretty video featuring Billy Barr, a gentleman that has been recording weather day in his corner of Gothic, Colorado for the last 40 years. 

Billy Barr
This brief video highlights his work. And his data provides evidence of climate change. I like this video because it shows how ANYONE can be a statistician, as long as...

They use consistent data collection tools...

They are fastidious in their data entry techniques...

They are passionate about their research. Who wouldn't be passionate about Colorado?

Monday, March 27, 2017

Shameless Self Promotion: I wrote a chapter in a book about Open Educational Resources!

Let's make the academy better for science and better for our students, and let's make it better for free.

Want to learn how? I recommend a Open: The Philosophy and Practices that are Revolutionizing Education and Science, edited by Rajiv Jhangiani and Robert Biswas-Diener.

In the spirit of open resources, it is totally free.

In the spirit of open pedagogy and quick sharing of teaching ideas, I wrote a chapter for the book about how I've gone about sustaining a blog dedicated to teaching for the last four years. The basic message of my chapter: I blog about teaching, and you can, too!  Here are all the chapters from the book:

Johnson's "The reasons we don’t study gun violence the same way we study infections"

This article from The Washington Post is a summary of an article from the Journal of the American Medical Association. Both are simple, short articles that demonstrates how to use statistics to make an argument. Here, that argument is made via regression in order to demonstrate the paucity of funding and publications for research studying gun related deaths.

What did the researchers do? Regression. A regression line was generated in order to predict how much money is spent studying common causes of death. We see that deaths by fire arms aren't receiving proportional funding relative to the deaths they cause. See the graph below.

How to use in class:

1) How is funding meted out by our government in order to better understand problems that plague our country? Well, it isn't being given to researchers studying gun violence because of the Dickey Amendment. I grew up in a very hunting friendly/gun friendly part of Pennsylvania. I've been to the shooting range. And it upsets me that we can't better understand and study best practices for safe gun ownership.

2) Another issue: We don't talk about suicide enough. Half of the gun deaths were suicides.

3) There seems to be under-funding of possible accidents, as opposed to diseases, that cause death (shooting, motor vehicle, falls, and asphyxia). Why might this be?

4) The above image demonstrates correlation/linear relationships as well as gun violence as an influential observation.

5) Regression, y'all. 

The WP article states, 

"If public health issues were funded based on their death toll, gun violence injuries would have been expected to receive about $1.4 billion in federal research funding over about a decade — compared with the $22 million that it actually got, the study found." 

They predicted Y (research funding) based on X (death toll) and found a discrepancy, and the discrepancy is used to make an argument about the funding short fall. If you go to the JAMA article, they describe the research article publication shortfall as well. According to that regression equation, there should be over 38K articles published about gun deaths. Instead, there are 1,738.

Monday, March 20, 2017

Retracton Watch's "Study linking vaccines to autism pulled following heavy criticism"

This example from Retraction Watch illustrates how NOT to do research. It is a study that was accepted and retracted from Frontiers in Public Health. It purported to find a link between childhood vaccination and a variety of childhood illnesses. This would be a good case study for Research Methods. In particular, this example illustrates:

1) Retraction of scientific studies
2) The problems with self-report surveys
3) Sampling and trying to generalized from a biased samples
4) What constitutes a small sample size depending on the research you are conducting
5) Conflict of interest

This study, since retracted, studied unvaccinated, partially vaccinated, and fully vaccinated children.

And the study found "Vaccinated children were significantly less likely than the unvaccinated to have been diagnosed with chickenpox and pertussis, but significantly more likely to have been diagnosed with pneumonia, otitis media, allergies and NDDs (defined as Autism Spectrum Disorder, Attention Deficit Hyperactivity Disorder, and/or a learning disability)."

But the study surveyed moms who home school their children. A group that is historically, but not exclusively, anti-vaccination. From the study:

"Homeschool organizations in four states (Florida, Louisiana, Mississippi, and Oregon) were asked to forward an email to their members, requesting mothers to complete an anonymous online questionnaire on the vaccination status and health outcomes of their biological children ages 6 to 12."

And money to conduct this study was crowd sourced via a pro Autism:Vaccination website. There are other problems with this study, as noticed by Retraction Watch as well as the summary piece below. The sample sized used was relatively small for this kind of research, no one verified the various diagnoses via medical records, etc.

So, there is plenty for your students to consider with this study. Maybe you could create a methodology for this study that would fix the current, flawed methodology. Or you could just give your student the summary of the study and ask them to find the problems.

Further treatment (and deconstruction) of the study can be found here.

Monday, March 13, 2017

I've tracked all my son's first words since birth [OC]

Reddit user jonjiv conducted a case study in human language development. He carefully monitored his son's speaking ability, and here is what he found: to this link for a clearer picture of the chart!

How to use in class:
1) Good for Developmental Psychology. Look at that naming explosion!
2) Good to demonstrate how nerdy data collection can happen in our own lives.
3) Within versus between subject design. Instead of sampling separate 10, 11, 12, etc. month old children, we have real-time data collected from one child. AND this isn't retrospective data, either.
4) Jonjiv even briefly describes his "research methodology" in the original post. The word had to be used in a contextually appropriate manner AND observed by both him and his wife (inter-rater reliability!). He also stored his data in a Google sheet because of convenience/ease of tracking via cell phone.

Monday, March 6, 2017

Annenberg Learner's "Against All Odds"

Holy smokes. How am I just learning about this amazing resource (thanks, Amy Hogan, for the lead) now?

The folks over at Annenberg, famous for Zimbardo's Discovering Psychology series, also have an amazing video collection about statistics, called "Against All Odds".

Each video couches a statistical lesson in a story.

1) In addition to the videos, there are student and faculty guides to go along with every video/chapter. I think that using these guides, and instructor could go textbook free.
2) The topics listed approximate an Introduction to Statistics course.

Monday, February 27, 2017

rStats Institute's "Guinness, Gossett, Student, and t Tests"

This is a nice video for introducing t-tests AND finally getting the story straight regarding William Gossett, Guinness Brewery, and why Gossett published under the famous Student pseudonym. What did I learn? Apparently, Gossett DID have Guinness' blessings to publish. Also, this story demonstrates statisticians working in Quality Assurance as the original t-tests were designed to determine the consistency in the hops used in the brewing process. Those jobs are still available in industry today.

Credit goes to the RStats Institute at Missouri State University. This group has created a bunch of other tutorial videos for statistics as well.

Monday, February 20, 2017

Raff's "How to read and understand a scientific paper: a guide for non-scientists"

Jennifer Raff is a geneticist, professor, and enthusiastic blogger. She created a useful guide for how non-scietists (like our students) can best approach and make sense of research articles.

The original aritcle is very detailed and explains how to go about making sense of experts. Personally, I appreciate that this guide is born out of trying to debate non-scientists about research. She wants everyone to benefit from science and be able to make informed decisions based upon research. I think that is great.

In the classroom, I think this would be a good way to introduce your undergraduates to research articles.

I especially appreciated this summary of her steps (see below). This could be turned into a worksheet with ease. Note: I still think your students should chew on the full article before they would be ready to answer these eleven questions.

If you are looking for a more psychology-specific guide for learning how to read research, I also love this perennially popular piece by Jordan and Zanna. It may be entitled "How to read an article in social psychology", but it is a good guide to reading research in any psychology discipline. I teach two research-reading heavy psychology electives (Positive and Motivation and Emotion) and I assign this article, and a quiz about this article, during the first week of both classes.

Anyone else have any other suggestions for guides to reading reserach? Lemme know and I'll add them to this post.

Monday, February 13, 2017

NY Magazine's "Finally, Here’s the Truth About Double Dipping"

Yes, it includes the Seinfeld clip about George double dipping.

The video provides a brief example of how to go about testing a research hypothesis by operationalizing a hypothesis, collecting, and analyzing data. Here, the abstract question is about how dirty it is to double dip. And they operationalized this question:

Research design: The researchers used a design that, conceptually, demonstrates ANOVA logic (the original article contains an ANOVA, the video itself makes no mention of ANOVA). The factor is "Dips" and there are three levels of the factor:

Before they double dipped, they took a base-line bacterial reading of each dip. Good science, that.
They display the findings in table form (again, no actual ANOVA). 

I am totally horrified by this salsa data.

However...the acidity of the salsa seems to help out in terms of killing bacteria after two hours. So, dig into that bowl of salsa two hours after your last guests go home? Still ew.

Because of the re-testing, using 1) baseline, 2) Time 1, and 3) Time 2, this now becomes a good example of a repeated measures ANOVA.

How to use in class:

1) How do we go from a research question to research?
3) Repeated measure design

Monday, February 6, 2017

Refutations to Anti-Vaccine Memes' Vaccination rates vs. infection rates

Refutation to Anti-vaccination Memes came up with this nice illustration to explain why anti-vaxxers shouldn't claim a "win" just because more vaccinated people than un-vaccinated people get sick during an outbreak.

I feel that this example has a bit more credence if paired with actual immunization rate/infection rate data. For instance, a case when an outbreak has occurred and the majority of infected are immunized, but there were still some un-immunized individuals.

To further this case, yes, most people in America are immunized. Here is an example of a an outbreak that has been linked to un-vaccinated folks.

How to use in class:
-Base rate fallacy (which DOES matter when making an argument with descriptive stats!)
-Relative v. absolute risk.
-Making sense of and contextualizing descriptive statistics.

Thursday, February 2, 2017

Southern Poverty Law Center's Hate Map

The Southern Poverty Law Center has used mapping software in order to illustrate the location of different hate groups in the US.

How to use in class:

I think this demonstrates how good old descriptive data collection plays a valuable roll in law enforcement, social justice, etc.

I think this demonstrates why well-visualized data may be a more compelling way of sharing information than data in tables.

Another way to use this is for your students to create a methods section based upon the data collection information provided on the website:

You can make the data more personalized for your class by digging down to state-wide data.

In addition to the maps, the website includes various other descriptive data quantifying different hate groups in the US.

I used this in class along with  other examples of how data can be mixed with maps in order to provide information on regions/states.

This could also be used in a Social Psychology class in order to illustrate the presence of organized, deliberate prejudice in our society.

Monday, January 30, 2017

Shaver's Female dummy makes her mark on male-dominated crash tests

Here is another example of why representative sampling MUST include women. For years and years, car crash test dummies for adults were all based upon the 50th percentile male. As such, even in vehicles with high safety ratings, women still has higher rates of certain injuries (head, neck, pelvis) than men. In fact, the article cites research that found that belted female car occupants in accidents have a 47% higher chance of suffering a serious injury and a 71% higher chance of suffering a moderate injury compared to men in a car.

I wrote a previous blog post about this video that outlines how using only male rats for pharmaceutical research lead to problems with disproportionately high numbers of side effects in female humans. And this NPR story details changes to federal rules in order to correct for this issue with animal testing.

How to use in class:

-Inappropriate sampling is hurting and killing women.
-Many test dummies are constructed using descriptive statistics as to create an "average" human. The most often used dummy represents a 50th percentile male.
-Pair it with my articles/stories/videos about the lack of female representation in pharmaceutical testing and you've got yourself a nice class discussion about representative sampling and subtle but dangerous forms of sexism (Why is a average male considered an average human? How can this problem be addressed?).

Monday, January 23, 2017

DeBold & Friedman's "Battling Infectious Diseases in the 20th Century: The Impact of Vaccines"

The folks at Wall Street Journal took CDC disease data (by state, by year, courtesy of Project Tycho) as well as information on when various vaccines were introduced to the public. And the data tells a compelling story about the importance of vaccinations. Below, the story of measles.

How to use in class:
-Using archival data to educate and make a point (here, vaccine efficacy)
-Visualizing many data points (infections x state x year) effectively
-Interactive: You can cursor over any cube to see the related data. Below, I've highlighted Pennsylvania data from 1957.

-Since  you can cursor over any data point to see the data, you can ask your students to pull data for use in class.
-The present data was draw from Project Tycho, a University of Pittsburgh initiative to better share public health data. This resource may be useful to your classes as well.
-This data is good for Stats class, as well as Developmental, Health, Public Health, etc.

Monday, January 16, 2017

Our World in Data website

Our World in Data is an impressive, creative-commons licensed site managed by Max Roser.

And it lives up to its name. The website provides all kinds of international data, divided by country, topic (population, health, food, growth & inequality, work and life, etc.), and, when available, year. It contains it's own proprietary data visualizations, which typically feature international data for a topic. You can customize these visualizations by nation. You can also DOWNLOAD THE DATA that has been visualized for use in the class room.

Much of the data can be visualized as a map and progress, year by year, through the data, like this data on international human rights.

There are also plenty of topics of interest to psychologists who aren't teaching statistics.

For example, international data on suicide:

Data for psychology courses...

Working hours for I/O psychologists:

Data on specific hate crimes (here, lynching) for social psychology:

How to use in class:
-Not all graphs are appropriate for all data and all of the ways we use data. When might the mapping of data work well? When would it be better to show changes in data per country over time?
-For each of the visualizations, you can also click on "DATA" if you want your students to work with the data on their own.
-The website beautiful demonstrates how to tell a story and build an argument using descriptive data. I know that I emphasized the data visualization/data download piece, but for each of the subtopics, a story is told. In addition to using their own visualizations, the website frequently references and cites outside data sources and visualizations. 

Monday, January 9, 2017

Parents May Be Giving Their Children Too Much Medication, Study Finds

Factorial ANOVA example ahead! With a lovely interaction. And I have a year old and a 4.5 year old and they are sickly day care kids, so this example really spoke to me.

NPR did a story about a recent publication that studied how we administer medicine to our kids and provides evidence for a few things I've suspected: Measuring cups for kid medicine are a disaster AND syringes allow for more accurate dosing, especially if the dose is small.

The researchers wanted to know if parents properly dosed liquid medicine for their kids.

The researchers used a 3 (dosage, 2.5, 5.0, 7.5 ml) x 3 (modality: small syringe, big syringe, medicine cup) design. They didn't use factorial ANOVA in their analysis, this example can still be used to conceptually explain factorial ANOVA.

Their findings:

How to use in class:

-An easy to follow conceptual example of factorial ANOVA (again, they didn't use that analysis in the original paper, but the table above illustrates factorial ANOVA beautifully).
-An easy to follow example of what an interaction can look like.
-An example that is medical in nature
-An example that might reach your non-traditional and student parents
-Interesting methodology: They used for reals parents using for reals medical implements.
-How should doctors use this data? How about pharmacists and parents? What sort of implement is associated with the least overall error?

Sunday, January 1, 2017

Pokemon Go and physical activity

In honor of the New Year, a post about health. 

A team of researchers from Harvard made a brief video that describes their recent publication. The video includes discussion about their hypothesis generation, methodology, and research findings. 

Their research question: Does the game Pokemon Go actually improve the health of users?

How to use this video in your class:

-This is an easily understood research project to share with your RM students. It also goes into detail about the statistics used for analysis.

-And the researchers, from fancy-pants Harvard, aren't afraid of being a bit silly and having fun as researchers. As demonstrated by the below images from the video:

This guy. Seriously. I hope to some day love my data as much as he loves his data.

And they made graphs using Pokemon balls

-How do we get our research ideas? Sometimes, from observations about every day living. This research was inspired by the Pokemon Go phenomena. I try to convince my students that many research ideas are the result of genuine curiosity about the world, and I think this does a good job of illustrating it.

-One of the co-authors describes the methodology: How the recruited and chose participants, describe the regression model they used, factored out seasonal differences the could affect how much time people spend outside.

-It is a video.

-Accessible example of quasi-experimental methods to create an a control group and an experimental group.

-You could use this video as the Cliff Notes to accompany the actual research paper. For students who are new to reading source material, this might be a soft step into navigating publications.

-Could be good for human factors or health psychology classes in addition to stats/RM.