Skip to main content

Cheng's "Okcupid Scraper – Who is pickier? Who is lying? Men or Women?"

People don't always tell the whole truth on dating websites, embellishing the truth to make themselves more desirable. This example of how OK Cupid users lie about their heights is a good example for conceptually explaining null hypothesis testing, t-tests, and normal distributions.

So, Cheng, article author and data enthusiast, looked through OK Cupid data. In this article, she describes a few different findings, but I'm going to focus on just one of them: She looked at users' reported heights. And she found a funny trend. Both men and women seem to report that they are taller than they actually are. How do we know this? Well, the CDC collects information on human heights so we have a pretty good idea of what average heights are for men and women in the US. And then the author compared the normal curve representing human height to the reported height data from OK Cupid Users. See below...

From http://nycdatascience.com/okcupid-scraper/, by Fangzhou Cheng 

From http://nycdatascience.com/okcupid-scraper/, by Fangzhou Cheng

I can think of a number of ways to use this example:

-Null hypothesis testing/effect sizes, in general: Do you control and experimental groups overlap? By how much? Essentially, we are more likely to find significance/large effects the less they overlap. These two figures demonstrate this idea pretty nicely.

-A conceptual example of one-sample t-test. The CDC can provide us with a given number representing average male or female height, which is our known mean/mu. We could then test that number against all of the male or female heights reported by OK Cupid Users. Well, not really test, as we don't have the raw data, but it conveys the idea conceptually.

-This might even make a good example for Social or Evolutionary Psychology.

-Higher level statistics classes could also learn from the code he the author generously shared.

-I remember learning in graduate school that men typically round up when researchers ask them their number of sexual partners, and women typically round down. We can add height to the list of things that people fib about, especially within the context of seeking out a dating partner.

More of Cheng's work can be viewed here.

Comments

  1. Link to Cheng is down!

    ReplyDelete
    Replies
    1. Hmmm...can't find it at the original website, will edit this post with the original author's findings. Thanks for the heads up, Anonymous Friend!

      Delete

Post a Comment

Popular posts from this blog

Ways to use funny meme scales in your stats classes

Have you ever heard of the theory that there are multiple people worldwide thinking about the same novel thing at the same time? It is the multiple discovery hypothesis of invention . Like, multiple great minds around the world were working on calculus at the same time. Well, I think a bunch of super-duper psychology professors were all thinking about scale memes and pedagogy at the same time. Clearly, this is just as impressive as calculus. Who were some of these great minds? 1) Dr.  Molly Metz maintains a curated list of hilarious "How you doing?" scales.  2) Dr. Esther Lindenström posted about using these scales as student check-ins. 3) I was working on a blog post about using such scales to teach the basics of variables.  So, I decided to create a post about three ways to use these scales in your stats classes:  1) Teaching the basics of variables. 2) Nominal vs. ordinal scales.  3) Daily check-in with your students.  1. Teach your students the basics...

Rouse, Russel, & Campbell (2025) is a curated list of Psi Chi journals that are perfect for Intro Stats.

This summer, the Psi Chi Journal of Psychology Research published  Rouse, Russel, and Campbell's Beyond the textbook: Psi Chi Journal articles in introductory psychology courses. It is a curated list of paywall-free Psi Chi articles, mostly with student co-authors, that are peer-reviewed and of an appropriate writing level and length to use in an Introduction to Psychology course. The authors provide the following information for each of the articles: In addition to being appropriate for Into Psych, these articles are also perfect for Intro Stats. In my classes, I emphasize the ability to read and write simple result sections. One way I would review this skill is by showing my students Results sections from published research and asking them to identify the test statistics, effect size, and other relevant information. This selection of articles features clear and concise results sections for t -tests, ANOVA, factorial ANOVA, regression, and correlation. I created a spreadsheet...

Using pulse rates to determine the scariest of scary movies

  The Science of Scare project, conducted by MoneySuperMarket.com, recorded heart rates in participants watching fifty horror movies to determine the scariest of scary movies. Below is a screenshot of the original variables and data for 12 of the 50 movies provided by MoneySuperMarket.com: https://www.moneysupermarket.com/broadband/features/science-of-scare/ https://www.moneysupermarket.com/broadband/features/science-of-scare/ Here is my version of the data in Excel format . It includes the original data plus four additional columns (so you can run more analyses on the data): -Year of Release -Rotten Tomato rating -Does this movie have a sequel (yes or no)? -Is this movie a sequel (yes or no)? Here are some ways you could use this in class: 1. Correlation : Rotten Tomato rating does not correlate with the overall scare score ( r = 0.13, p = 0.36).   2. Within-subject research design : Baseline, average, and maximum heart rates are reported for each film.   3. ...