Skip to main content

Posts

Showing posts with the label archival data

Full Discussion Board Idea #2: Trends in love songs, as illustrated by The Pudding

  You aren't a proper stats nerd if you have not scrolled for an hour through all of  The Pudding's  content .  Thank goodness for The Pudding, which helped me spice up the discussion boards in my online stats class. For a long time, I emphasized rigor over wonder. In my stats class, I had functionally reasonable but not terribly engaging topics for class discussion. That changed last semester. I spiced up my discussion board with some of my favorite data visualizations, like this one about using a fast food app to track power outages after a natural disaster and this one that illustrates data on the efficacy of nutritional supplements in a beautiful and functional way. Here is another that lets students look at trends in art and wonder about how this may reflect on cultural shifts in courting and romantic relationships . TL;DR The Pudding recently shared a post about trends in love songs from 1958 through 2023. The whole interactive is very engaging and lets yo...

Full Discussion Board Idea #1: Repurposing gently-used, second-hand data during times of crisis

I can't be the only one teaching online statistics this Spring. Last fall, I refreshed ALL of my discussion boards for my online version of Psychological Statistics. I haven't done so since 2020, and my students responded well to my new discussion topics, all of which are centered in statistical literacy and improving problem-solving with data. My first one is based on this old blog post about how residents of Houston used a Whataburger location map to figure out which parts of Houston were without electricity following Hurricane Meryl. Here is how I presented it to my students: You never know where valuable data visualizations will come from! For instance, following Hurricane Beryl, Texans used the Whataburger app to track power outages across Houston. Whataburger is a popular restaurant chain in the South. Its app has a feature where users can quickly see open or closed locations. Normally, this is used by hungry people to find the closest, open location. HOWEVER: There are S...

Statistical thinking: What data would you need to collect to disprove the predictive power of astrological signs?

Okay. I haven't used this in class yet because it is July, and I just found it. However, I will open the Fall 2024 semester with this example. It is fun and accessible and shows how research can be used to study whether or not personality varies based on astrological signs. I will start by showing them a bunch of funny astrology memes (see above). Then, I'll ask them to think of ways to design a study to prove that astrology is/is not bunk. What sort of data would they need to collect to do this?  Then, I'm going to show them this study ( Joshanloo, 2024 ): https://onlinelibrary.wiley.com/doi/epdf/10.1111/kykl.12395?domain=author&token=BKSRDREAX9F3BKAWGVBD Statsy things to share with your students: 1. Archival data : The used repurposed, vintage, federal data. The General Social Survey, to be specific. Data scientists are trained to see the potential of random data sets.    The horoscope sign was simple to determine since the GSS collects birthday data. The author was...

Transforming your data: A historical example

TL:DR: Global water temperature data from <1940 was collected by sailors collecting buckets of water from the ocean and recording the temperature of their bucket water. But some recorded data was rounded (thanks, Air Force!). Then, researchers had to transform their data. ^Go to the 3 minute mark to see the bucket-boat-water-temperature technique in action Here is the original research,  published in Nature . NPR covered the research article . Reporter Rebecca Hersher didn't discuss the entire research paper. Instead, she told the story of how the researchers discovered and corrected for their flawed ocean water temperature data. This story might be a little beyond Intro Stats, but it tells the story of messy, real archival data used to inform global climate change and b) introduces the idea of data transformations. Below, I will highlight some of the teaching items. Systematic bias: The data were all flawed in the same way as they were transcribed without any da...

Smart's "The differences in how CNN MSNBC & FOX cover the news"

https://pudding.cool/2018/01/chyrons/ This example doesn't demonstrate a specific statistical test. Instead, it demonstrate how data can be used to answer a hotly contested question: Are certain media outlets biased? How can we answer this? Charlie Smart, working for The Pudding, addressed this question via content analysis. Here is how he did it: And here are some of their findings: Yes, Fox News was talking about the Clintons a lot. While over at MSNBC, they discussed the investigation into Russia and the 2016 elections ore frequently. While kneeling during the anthem was featured on all networks, it was featured most frequently on Fox And context matters. What words are associated with "dossier"? How do the different networks contextualize President Trump's tweets? Another reason I like this example: It points out the trends for the three big networks. So, we aren't a bunch of Marxist professors ragging on FOX, and we ar...

Izadi's "Black Lives Matter and America’s long history of resisting civil rights protesters"

Elahe Izadi, writing for The Washington Post, shared polling data from the 1960s. The data focused on public opinion about different aspects of the civil rights movement (March on Washington, freedom riders, etc.). The old data was used to draw parallels between the mixed support for the Civil Rights Movement of the 1960s and the mixed support for current civil rights protests, specifically, Black Lives Matter. Here is the  Washington  Post  story on the polling data, the civil rights movement, and Black Lives Matter. The story is the source of all the visualizations contained below. H ere is the original polling data . https://img.washingtonpost.com/wp-apps/imrs.php?src=https://img.washingtonpost.com/blogs/the-fix/files/2016/04/2300-galluppoll1961-1024x983.jpg&w=1484 https://img.washingtonpost.com/wp-apps/imrs.php?src=https://img.washingtonpost.com/blogs/the-fix/files/2016/04/2300-galluppoll1963-1024x528.jpg&w=1484 I think this is timely data. And...

Hedonometer.org

The Hedonometer measures the overall happiness of Tweets on Twitter. It provides a simple, engaging example for  Intro Stats since the data is graphed over time, color-coded for the day of the week, and interactive. I think it could also be a much deeper example for a Research Methods class as the " About " section of the website reads like a journal article methods section, in so much that the Hedonometer creators describe their entire process for rating Tweets. This is what the basic table looks like. You can drill into the data by picking a year or a day of the week to highlight. You can also use the sliding scale along the bottom to specify a time period. The website is also kept very, very up to date, so it is also a very topical resource. Data for white supremacy attack in VA In the pages "About" section, they address many methodological questions your students might raise about this tool. It is a good example for the process researchers go ...

Sonnad and Collin's "10,000 words ranked according to their Trumpiness"

I finally have an example of Spearman's rank correlation to share. This is a political example, looking at how Twitter language usage differs in US counties based upon the proportion of votes that Trump received. This example was created by  Jack Grieves , a linguist who uses archival Twitter data to study how we speak. Previously, I blogged about his work that analyzed what kind of obscenities are used in different zip codes in the US . And he created maps of his findings, and the maps are color coded by the z-score for frequency of each word. So, z-score example. Southerners really like to say "damn". On Twitter, at least. But on to the Spearman's example. More recently, he conducted a similar analysis, this time looking for trends in word usage based on the proportion of votes Trump received in each county in the US. NOTE: The screen shots below don't do justice to the interactive graph. You can cursor over any dot to view the word as well as the cor...

Daniel's "Where Slang Comes From"

I think that language is fascinating. Back when I taught developmental, I always liked to teach how babies learn to talk in sort of the same way all across the world. I like regional difference in American English (for example, swearing and regional colloquialisms ). So, I really like this research that investigates the rise and fall of slang in America. And I think it could be used in a statistics class. How to use in class? 1. Funny list of descriptive statistics. 2. Research methodology for using Google searches to answer a question. A good opening for discussion of archival data, data mining, and creating inclusion criteria for research methodology. 3. Using graphs to illustrate trends across time. This feature is interactive. 4. Further interactive features demonstrating how heat maps can be used to demonstrate state-by-state popularity over time. Here, "dank memes" peaked in April 2016 in Montana. 5. The author eye-balled the data can came up ...

Trendacosta's Mathematician Boldly Claims That Redshirts Don't Actually Die the Most on Star Trek

http://gazomg.deviantart.com/art/Star-Trek-Redshirt-6-The-Walking-Dead-483111105 io9 recaps a talk given by mathematician  James Grime . He addressed the long running Star Trek joke that the first people to die are the Red Shirts. Using resources that detail the ins and outs of Star Trek, he determined that: This makes for a good example of absolute vs. relative risk. Sure, more red shirts may die, absolutely, but proportionally? They only make up 10% of the deaths. Also, I think this is a funny example of using archival data in order to understand an actual on-going Star Trek joke. For more math/Star Trek links, go to space.com's treatment of the speech.

Daniel's "Most timeless songs of all time"

This article, written by Matt Daniels  for The Pudding , allows you to play around with a whole bunch of Spotify user data in order to generate visualizations of song popularity over time. You can generate custom visualizations using the very interactive sections on this website. For instance, there is a special visualization that allows you to finally quantify the Biggie/Tupac Rivalry. So, data and pop culture are my two favorite things. I could play with these different interactive pieces all day long. But there are also some specific ways you could use this in class. 1) Generate unique descriptive data for different musicians and then ask you students to create visualizations using the software of your choosing. Below, I've queried Dixie Chicks play data. Students could enter their own favorite artist. Note: They data only runs through 2005. 2) Sampling errors: Here is a description of the methodology used for this data: Is this representative of all data...

Our World in Data website

Our World in Data is an impressive, creative-commons licensed site managed by Max Roser . And it lives up to its name. The website provides all kinds of international data, divided by country, topic (population, health, food, growth & inequality, work, and life, etc.), and, when available, year. It contains its own proprietary data visualizations, which typically feature international data for a topic. You can customize these visualizations by nation. You can also DOWNLOAD THE DATA that has been visualized for use in the classroom. Much of the data can be visualized as a map and progress, year by year, through the data, like this data on international human rights. https://ourworldindata.org/human-rights/  https://ourworldindata.org/human-rights/ There are also plenty of topics of interest to psychologists who aren't teaching statistics. For example, international data on suicide: Data for psychology courses...https://ourworldindata.org/suicide/ Work...

Kevin McIntyre's Open Stats Lab

Dr. Kevin McIntryre from Trinity University has created the Open Stats Lab.  OSL provides users with research articles, data sets, and worksheets for studies that illustrate statistical tests commonly taught in Introduction to Statistics. Topics covered, illustrated beautifully by Natalie Perez All of his examples come from Open Science Framework-compliant publications from Psychological Science. McIntyre presents the OSF data (in SPSS, R, and .  CSV files are available ), the original research article, AND a worksheet to accompany each article. Layout for each article/data set/activity. This article demonstrates one-way ANOVA. I know. It can be challenging to find 1) research an UG can follow that 2) contains simple data analyses. And here, McIntryre presents it all. This project was funded by a grant from APS.

Collin's "America’s most prolific wall punchers, charted"

C ollin gleaned some archival data about ER visits in America from US Consumer Product Safety Commission. For each ER visit, there is a brief description of the reason for the visit. Collin queried punching related injuries. See his Method section below describes how he set the parameters for his operationalized variable. With a bit of explaining, you could also describe how Collin took qualitative data (the written description of the injury) and converted it into quantitative data: http://qz.com/582720/americas-most-prolific-wall-punchers-charted/ Then he made some charts. The age of wall punchers is right-skewed. And probably could be used in a Developmental Psychology class to illustrate poor judgment in adolescents as well as the emergence of the prefrontal cortex/executive thinking skills in one's early 20s. http://qz.com/582720/americas-most-prolific-wall-punchers-charted/ The author looked at wall punching by month of the year and uncovered a fairly uniform d...