Tuesday, June 30, 2020

Lessons I made that you can throw in your class, easy-peasy.

I started this blog with the hope of making life easier for my fellow stats instructors. I share examples and ideas that I use in my own classes in hopes that some other stats instructor out there might be able to incorporate these ideas into their classes.

As we crash-landed into the online transition last Spring, I created took some of blog posts and made them into lengthier class lessons, including Google Slides and, when applicable, data sets shared via my Google Drive. I ended up with four good lessons about the four big inferential tests typically cover in Psych Stats/Intro Stats: T-test, ANOVA, chi-square, and regression.

I think these examples serve as great reviews/homework assignments/an extra example for your students as they try to wrap their brain around statistical thinking.

As we are preparing for the Fall, and whatever the Fall bring, I wanted to re-share all of those examples in one spot.




I love this example, and I think my students do, too. It is interactive, uses an actual data set, and teaches your student about the Big Five.

A bunch of researchers collected Big Five data from hundreds of thousands of Americans, then divided the data by state, and by region, to look for personality trends in the data. It was written up in Time Magazine. The Time Magazine even features the short-form version of the Big Five and will match your student to the state that best matches their Big Five scale. 

They made their data available in the original publication, and I made their data available on Google Drive. Since there are five personality traits, this example provides data for five one-way ANOVAs. Region of the United States is the factor.


I'm equally proud and ashamed of this example. This example of regression uses an itty-bitty data set (n = 5, or Dennis Quaids four ex-wives and current wife) to explain regression, using the woman's birth year to predict the year when they will marry Dennis Quaid. 


This example is more theory than doing. I found a bunch of clips from the show Mythbusters, clips in which the crew is testing their hypotheses. In these specific examples, the crew is using methodology that could be analyzed using the different t-tests. 

So, this example could be used to review the application of the three different t-tests, depending on what your research design looks like. There is one example that provides the actual (small) data set that the Mythbusters crew collected, and I asked my students to analyze that data. Otherwise, this example focuses on research design for different t-tests.


This presentation contains two examples: One serious, one fun. The more serious example involves an NPR story about a Developmental Psychologist who found evidence that children who dance in sync with a stranger (or don't) are more likely to offer that stranger assistance in the future (or not offer assistance, so a 2x2 test of independence. The fun example suggests that the over-worked, under-paid Taco Bell employee gives you a giant fist full of sauces at random, as tested using goodness-of-fit. 

Sunday, June 28, 2020

Florida, COVID-19, and several examples for stats class

People I love very much live in Florida. My very favorite academic conference is held in Florida. I want Florida to flatten the curve.

But Florida is flattening the curve. Believe me when I say that I'm not trying to dunk on Florida, but Florida has provided me with prime material for the teaching of statistics. Timely material that illustrates weaponized data.

Some examples are more straightforward, like median and poor data visualization. Others illustrate a theme that I cover in my own stats class, a theme that we should all be discussing in our stats class: Data must be very, very powerful if so many large organizations work so hard to discredit it, manipulate it, and fire people who won't. You should also point out to your students that the data that organizations are working so hard to discredit is typically straight-forward descriptive data, not graduate-level data analysis. 

1. Measures of Central Tendency

As of June 23, the median age of people newly diagnosed with COVID-19 in Florida dropped to 35. Which provides an example of the median, and why a simple measure of central tendency can be very important.

If you want to do a deep dive, check out this Twitter thread that offers suggestions for what this may mean:
More reporting on this data:

2. Bad data visualization example, presented without commentary.

h/t: https://twitter.com/DannyPage/status/1275257081675698178

3. Hiding data you don't like, Part 1

This sounds so dramatic, but books will be written about how Florida lied about their data and attacked people trying to report good data. Specifically, the story of Rebekah Jones (follow her on Twitter: @GeoRebekah). Jones worked for the state of Florida, using GIS to map out and track the COVID outbreak. For more information, check out Jones's website. She was fired. She claims that she was fired because she refused to misrepresent the data. More information on Jones's exact allegations has been reported elsewhere. Since she was fired, she has used Twitter to share her criticism of how Florida has handled their data, and she has lent her voice to whistle-blowers who state that more data manipulation continues to happen behind the scenes in Florida. She has also created her own dashboard to illustrate data from Florida.

For more famous instances of large organizations hiding/lying/conflict of interesting their data, see this blog post.

4. Hiding data you don't like, Part 2
Lessons: Operationalizing a variable, lying with data

Related to Ms. Jones's claims, there is further evidence from Florida that the state wants to manipulate COVID-19 data. They want to make a change that would potentially undercount people in ICUs. Change the field goal. Florida announced that it no longer wanted to hospitals to stop reporting all people in ICUs. Instead, they were only to report people in ICUs who actually needed intensive-care. And not all the lazy lay-abouts who go to the ICU for kicks, I suppose. 

This image is text from the Florida Politics article: https://floridapolitics.com/archives/342565-florida-changes-icu-reporting

Thursday, June 25, 2020

Using Pew Research Center Race and Ethnicity data across your statistics curriculum

In our stats classes, we need MANY examples to convey both theories behind and the computation of statistics. These examples should be memorable. Sometimes, they can make our students laugh, and sometimes they can be couched in research. They should always make our students think.

In this spirit, I've collected three small examples from the Pew Research Center's Race and Ethnicity archive (I hope to update with more examples as time permits). I don't know if any data collection firm is above reproach, but Pew Research is pretty close. They are non-partisan, they share their research methodology, and they ask hard questions about ethnicity and race. If you use these examples in class, I think that it is crucial to present them within context: They illustrate statistical concepts, and they also demonstrate outcomes of racism. 

Lessons: Racism, ANOVA theory: between-group differences, post-hoc tests

While this is descriptive data, I think that it illustrates several ANOVA concepts. Participants reported whether or not they had experienced several different situations. It demonstrates how life experiences in America are influenced by race. It can also be used to get your students thinking about ANOVA.

You can think about ethnicity as the factor with four levels (Asian, Black, Hispanic, White). 
If you are teaching ANOVA, this data visualization illustrates between-group differences, both in terms of statistics and life in America. You can ask your students to speculate on which item has the most between-group variance and the least between-group variance. You could also think about post-hoc results for these findings. For the question "Been unfairly stopped by the police", Asian and Hispanic respondents may not differ significantly from one another. Still, Black and White respondents vary considerably from each other and from Asians and Hispanics. 

Data from Pew: "Most blacks say someone has acted suspicious of them or as if they aren't smart."

Lessons: Racism, categorical data, nominal data, free-response options, research methods, psychometrics

This article describes the options provided by the US Census Bureau for reporting race (their term), and how far those options have evolved since the first US census (1790). It also illustrates how racist attitudes can be seen in the way the government elected to describe the race of American citizens. It gives a little hope that things are changing (at least at the Census Bureau).

In 1790, the only options for race were: a) Free white males and female, 2) All other free persons, and 3) slaves. 

Until 1960, the person recording your census data would pick your race. 

Americans couldn't describe their backgrounds by selecting multiple races until 2000

Regardless, the Census Bureau is trying harder in 2020, and the new way of recording race provides examples of categorical variables, qualitative data, free response. 


Furthermore, if you want to discuss politicized data collection, you could also dig into Hansi Lo Wang's reporting for NPR on this topic. It is excellent.

Lessons: Racism, Chi-Square theory: observed data, expected data

Pew Research created a graph to illustrate one facet of systematic racism in health care: Black people are dying of COVID-19 in numbers that are disproportionate and alarming. (Aside: I also teach my students about this, framing it in data-driven ML, using this article from Nature)

I think this graph does an excellent job of conveying a concept at the core of chi-square: The distance between expected data (The black portion of the population) versus the observed data (Black share of COVID-19 deaths) is illustrated by state.