Skip to main content

Posts

Showing posts with the label sampling

Sampling Error (Taylor's Version)

Friends. You don't know what finding fun stats blog content has been like over the last few years. All of the data writers/websites I followed were always writing about, explaining, and visualizing COVID or political data (rightfully so). I prefer examples about puppies , lists of songs banned from wedding reception s, and ghosts . Memorable examples stick in my students' heads and don't presuppose any knowledge about psychological theory.  Due to the lack of silly data and my own life as a professor, mom of two, wife, and friend, my number of posts during The Rona definitely dipped.  But now, as the crocuses bloom in Erie, PA, the earth, and I, are finding new life and new examples. Nathaniel Rakich, writing for FiveThirtyEight, wrote a whole piece  USING TAYLOR SWIFT TO EXPLAIN POLLING/SAMPLING ERROR S. Specifically, this article tackles three different polling firms and how they went about asking Americans which Taylor Swift album is their favorite Taylor Swift album....

Mona Chalabi's 100 New Yorkers: Data art

 I've been a fan of Chalabi's work for years (here is my favorite example of mean vs. median). She makes beautiful, hand-drawn data visualizations . She created a beautiful mural that represents New Yorkers. And when I say "represents," I mean that this image is a representative sample of New Yorkers. https://monachalabi.com/product/100-new-yorkers/ Her sample of 100 New Yorkers was not drawn (Drawn? Get it?) at random.  Below, in her own words, Chalabi describes her work and what it means: https://www.absolutart.com/us/artist/mona-chalabi/artwork/100-new-yorkers-ii/ This is a novel way to talk about sampling, and representative samples, weighing survey response options to make a sample more representative, etc. You could even get into sampling error and other problems created by non-representative samples. 

A quick NPR video describes random sampling in order to better understand the spread of COVID-19

This brief video from NPR (they make videos, what?) describes how the CDC will be randomly sampling Atlanta residents to test for COVID-19 antibodies. The efforts hope to provide a better estimate of the spread of the disease. H/t to Sy Islam for sharing this with me. I think you could use this in class as a super-fast example of how we use samples to generalize about larger populations. The CDC is sending out employees to conduct antibody tests on a random sample of Atlanta residents. The tests are meant to show how many people have been infected with the coronavirus. pic.twitter.com/mXqznHUJmV — NPR (@NPR) April 29, 2020

My favorite real world stats examples: The ones that mislead with real data.

This is a remix of a bunch of posts. I brought them together because they fit a common theme: Examples that use actual data that researchers collected but still manage to lie or mislead with real data. So, lying with facts. These examples hit upon a number of themes in my stats classes: 1) Statistics in the wild 2) Teaching our students to sniff out bad statistics 3) Vivid examples are easier to remember than boring examples. Here we go: Making Graphs Fox News using accurate data and inaccurate charts to make unemployment look worse than it is. Misleading with Central Tendency The mean cost of a wedding in 2004 might have been $28K...if you assume that all couples used all possible services, and paid for all of the services. Also, maybe the median would have been the more appropriate measure to report. Don't like the MPG for the vehicles you are manufacturing? Try testing your cars under ideal, non-real world conditions to fix that. Then get fined by the EPA. Mis...

'Nowhere To Sleep': Los Angeles Sees Increase In Young Homeless

Anna Scott, reporting for NPR, described changes to the homeless census in LA . It applies to stats/RM because an improvement in survey methodology lead to a big change in the city's estimation of number of homeless young adults. I also think this is also a good piece for teaching because the story keeps coming back to Japheth Greg Dyer, a homeless college student who aged out of the foster care and was sort of tossed into the world on his own. Straight from NPR: Homelessness hasn't necessarily increased dramatically. Instead, these findings seem to indicate that they finally have a reliable way to count young adult homelessness due to a better understanding of young adults. The dramatic increase is methodological.

Logical Fallacy Ref Meme

So, I love me some good statsy memes. They make a brief, important point that sticks in the heads of students. I've recently learned of the Logical Fallacy Ref meme. Here are a couple that apply to stats class:

Pew Research Center's Methods 101 Video Series

Pew Research Center  is an excellent source for data to use in statistics and research methods classes. I have blogged about them before (look  under the Label pew-pew! ) and I'm excited to share that Pew is starting up a series of videos dedicated to research methods. The new series will be called Methods 101 . The first describes sampling techniques in which weighing is used to adjust imperfect samples as to better mimic the underlying population. I like that this is a short video that focuses on one specific aspect of polling. I hope that they continue this trend of creating very specific videos covering specific topics. Looking for more videos? Check out Pew's YouTube Channel . Also, I have a video tag for this blog. 3/25/2018 They have posted their second video, this one on proper wording for research questions as to avoid jargon and bias.

Shaver's Female dummy makes her mark on male-dominated crash tests

Here is another example of why representative sampling MUST include women. For years and years, car crash test dummies for adults were all based upon the 50th percentile male. As such, even in vehicles with high safety ratings, women still have higher rates of certain injuries (head, neck, pelvis) than men. In fact, the article cites research that found that belted female car occupants in accidents have a 47% higher chance of suffering a serious injury and a 71% higher chance of suffering a moderate injury compared to men in a car. http://leevinsel.com/blog/2013/12/30/why-carmakers-always-insisted-on-male-crash-test-dummies I wrote a previous blog post about this video that outlines how using only  male rats for pharmaceutical research lead to problems with disproportionately high numbers of side effects in female humans . And this NPR story details changes to federal rules in order to correct this issue with animal testing. How to use in class: -Inappropriate sampling i...

Chokshi's "How Much Weed Is in a Joint? Pot Experts Have a New Estimate"

Alright, stick with me. This article is about marijuana dosage  and it provides good examples for how researchers go about quantifying their variables in order to properly study them. The article also highlights the importance of Subject Matter Experts in the process and how one research question can have many stakeholders. As the title states, the main question raised by this article is "How much weed is in a joint?". Why is this so important? Researchers in medicine, addictions, developmental psychology, criminal justice, etc. are trying to determine how much pot a person is probably smoking when most drug use surveys measure marijuana use by the joint. How to use in a statistics class:

Anya Kamenetz's "The Past, Present, And Future of High-Stakes Testing"

Kamenetz (reporting for NPR) talks about her book , Test , which is about the extensive use of standardized testing in our schools. Largely, this is a story about the impact these tests have had on how teachers instruct K-12 education in the US. However, a portion of the story discusses alternatives to annual testing of every student. Alternatives include using sampling to assess a school as well as numerous alternate testing methods (stealth testing, assessing child emotional well-being, portfolios, etc.). Additionally, this story touches on some of the implications of living in a Big Data society and what it is doing to our schools. I think this would be a great conversation starter for a research methods or psychometric course (especially if you are teaching such a class for a School of Education). What are we trying to assess: Individual students or teachers or schools? What are the benefits and short comings of these different kinds of assessments? Can you students come up with...

Mara Liasson's "The challenges behind accurate opinion polls"

This radio story  by Mara Liasson (reporting for NPR) discusses the surprising primary loss of former Republican House Majority Leader Eric Cantor. It was surprising because internal polling conducted by Cantor's team gave him an easy win, but he lost out to a Tea Party favorite, David Brat. The story goes on to describe why it is becoming increasingly difficult to conduct accurate voter polling via telephone and the internet. Some specific points from this story that teach students about sampling techniques: 1) Sample versus population: One limitation of polling data is the fact that many telephone call-based sampling techniques include landlines and ignore the growing population of people who only have cell phones. 2) Response rates for political polling are on a decline, making the validity of the available sample shrink. 3) Robocalls, while less expensive, have no way of validating that an actual registered voter is responding to the questions. Additionally, restrictio...

John Oliver and global climate change data

John Oliver demonstrates representative sampling by inviting three climate change deniers to debate 97 scientists who believe that global climate change is happening . Also, Bill Nye.

Public Religion Research Institute's “I Know What You Did Last Sunday” Finds Americans Significantly Inflate Religious Participation"

A study performed by The Public Religion Research Institute  used either a) a telephone survey or b) an anonymous web survey to question people about their religious beliefs and religious service habits. The researchers found that the telephone participants reported higher rates of religious behaviors and greater theistic beliefs. The figure below,  from a New York Times summary of the study , visualizes the main findings. The NYT summary also provides figures illustrating the data broken down by religious denomination. Property of the New York Times Participants also vary in their reported religious beliefs based on how they are surveyed (below, the secular are more likely to report that they don't believe in God when completing an anonymous online survey). Property of Public Religion Research Institute  This report could be used in class to discuss psychometrics, sampling, motivation to lie on surveys, social desirability, etc. Additionally, the sour...

Jess Hartnett's presentation at the 2014 APS Teaching Institute

Hi! Here is my presentation from APS . I am posting it so that attendees and everyone else can have access to the links and examples I used. If you weren't there for the presentation, a warning: It is text-light, so there isn't much of a narrative to follow but there are plenty of links and ideas and some soon-to-be-published research ideas to explore. Shoot me an email (hartnett004@gannon.edu) if you have any questions. ALSO: In the talk I reference the U.S. Supreme Court case Hall v. Florida ( also did a blog entry about this case ). Update: The court decided in the favor of Hall/seemed to understand standard error/made it a bit harder to carry out the death penalty, as discussed here by Slate). Woot woot!

Kevin Wu's Graph TV

UPDATE! This website is not currently available.  Kevin Wu's Graph TV  uses individual episode ratings (archival data via IMDB ) of TV shows, graphs each episode over the course of a series via scatter plot, and generates a regression line. This demonstrates fun with archival data as well as regression lines and scatter plots. You could also discuss sampling, in that these ratings were provided by IMDB users and, presumably, big fans of the shows (and whether or not this constitutes representative sampling). The saddest little purple dot is the episode Black Market. Truth!

UPDATE: The Knot's Infographic: The National Average Cost of a Wedding is $28,427

UPDATE: The average cost of a wedding is now $33,391, as of 2017 . Here is the most up to date infographic: Otherwise, my main points from the original version of this survey are still the same: 1) To-be-weds surveyed for this data come were users of a website used to plan/discuss/squee about pending nuptials. So, this isn't a random survey. 2) If you look at the fine print for the survey, the average cost points quoted come from people who paid for a given service. So, if you didn't have a reception band ($0 spent) your data wasn't used to create the average. Which probably leads to inflation of all of these numbers. _________________________________________ Original Post: This infographic describes the costs associated with an "average" wedding. It is a good example non-representative sampling and bending the truth via lies of omission. For the social psychologists in the crowd, this may also provide a good example of persuasion by establishing ...

NPR's "Will Afghan polling data help alleviate election fraud?"

This story details the application of American election polling techniques to Afghanistan's fledgling democracy. Essentially, international groups are attempting to poll Afghans prior to their April 2014 presidential elections as to combat voter fraud and raise awareness about the election. However, how do researchers go about collecting data in a country where few people have telephones, many people are illiterate, and just about everyone is weary about strangers approaching them and asking them sensitive questions about their political opinions? The story also touches on issues of social desirability as well as the decisions  a researcher makes regarding the kinds of response options to use in survey research. I think that this would be a good story to share with a cranky undergraduate research methods class that thinks that collecting data from the undergraduate convenience sample is really, really hard. Less snarkily, this may be useful when teaching multiculturalism or ...

Anecdote is not the plural of data: Using humor and climate change to make a statistical point

Variations upon a theme...good for spicing up a powerpoint...inspired by living in the #1 snowiest city (population > 100K, 2014) in the United States. property of xkcd.com https://thenib.com/can-t-stand-the-heat-4d5650fd671b

The Economist's "Unlikely Results"

A great, foreboding video  (here is a link to the same video at YouTube in case you hit the paywall) about the actual size and implication of Type II errors in scientific research. This video does a great job of illustrating what p < .05 means in the context of thousands of experiments. Here is an article from The Economist on the same topic. From TheEconomist

Burr Settles's "On “Geek” Versus “Nerd”"

Settles decided to investigate the difference between being a nerd and being a geek via a pointwise mutual association analysis (using archival data from Twitter). Specifically, he measured the association/closeness between various hashtag descriptors (see below) and the words nerd and geek. Settles provides a nice description of his data collection and analysis on his blog. A good example of archival data use as well as PMA.