Skip to main content

Transforming your data: A historical example

TL:DR: Global water temperature data from <1940 was collected by sailors collecting buckets of water from the ocean and recording the temperature of their bucket water. But some recorded data was rounded (thanks, Air Force!). Then, researchers had to transform their data.

^Go to the 3 minute mark to see the bucket-boat-water-temperature technique in action

Here is the original research, published in Nature. NPR covered the research article. Reporter Rebecca Hersher didn't discuss the entire research paper. Instead, she told the story of how the researchers discovered and corrected for their flawed ocean water temperature data.

This story might be a little beyond Intro Stats, but it tells the story of messy, real archival data used to inform global climate change and b) introduces the idea of data transformations. Below, I will highlight some of the teaching items.

Systematic bias: The data were all flawed in the same way as they were transcribed without any data to the right of the decimal point. Which is not ideal, but you can correct it.

Archival data: Sometimes, the data you need already exists somewhere. We have only been tracking weather for 100 years. But now we have this opportunity for older data, right, so we can better understand the larger cycle of global water temperatures.

I also like how they could tell that something was happening in their data but didn't know WHY something was happening in their data.

Well-annotated code books are good. That Air Force PDF uncovered by the researchers' friends is also a good example of why you must maintain a codebook and extensive notes on how you treat your data.

Research is so much more than numbers. This is the story of graduate student Duo Chan and his efforts to make his archival data as accurate as possible. It sounds like it was a pain in the ass. Such is science.

Measurement techniques. We can accurately measure water temperature now. I assume that it involves sharks with lasers on their heads. I don't know, but I can assume that we've moved on from  The Bucket Method, which was their best at the time.

Data transformations can be tricky to explain to the novice, like transformations to make data less skewed. This could be a simple way to introduce the topic. Yes, I know that actual transformation is much more involved than this, but it is a simple way to introduce the topic.

Comments

Post a Comment

Popular posts from this blog

Ways to use funny meme scales in your stats classes

Have you ever heard of the theory that there are multiple people worldwide thinking about the same novel thing at the same time? It is the multiple discovery hypothesis of invention . Like, multiple great minds around the world were working on calculus at the same time. Well, I think a bunch of super-duper psychology professors were all thinking about scale memes and pedagogy at the same time. Clearly, this is just as impressive as calculus. Who were some of these great minds? 1) Dr.  Molly Metz maintains a curated list of hilarious "How you doing?" scales.  2) Dr. Esther Lindenström posted about using these scales as student check-ins. 3) I was working on a blog post about using such scales to teach the basics of variables.  So, I decided to create a post about three ways to use these scales in your stats classes:  1) Teaching the basics of variables. 2) Nominal vs. ordinal scales.  3) Daily check-in with your students.  1. Teach your students the basics...

Using pulse rates to determine the scariest of scary movies

  The Science of Scare project, conducted by MoneySuperMarket.com, recorded heart rates in participants watching fifty horror movies to determine the scariest of scary movies. Below is a screenshot of the original variables and data for 12 of the 50 movies provided by MoneySuperMarket.com: https://www.moneysupermarket.com/broadband/features/science-of-scare/ https://www.moneysupermarket.com/broadband/features/science-of-scare/ Here is my version of the data in Excel format . It includes the original data plus four additional columns (so you can run more analyses on the data): -Year of Release -Rotten Tomato rating -Does this movie have a sequel (yes or no)? -Is this movie a sequel (yes or no)? Here are some ways you could use this in class: 1. Correlation : Rotten Tomato rating does not correlate with the overall scare score ( r = 0.13, p = 0.36).   2. Within-subject research design : Baseline, average, and maximum heart rates are reported for each film.   3. ...

Andy Field's Statistics Hell

Andy Field is a psychologist, statistician, and author. He created a funny, Dante's Inferno-themed  web site that contains everything you ever wanted to know about statistics. I know, I know, you're thinking, "Not another Dante's Inferno themed statistics web site!". But give this one a try. Property of Andy Field. I certainly can't take credit for this. Some highlights: 1) The aesthetic is priceless. For example, his intermediate statistics page begins with the introduction, "You will experience the bowel-evacuating effect of multiple regression, the bone-splintering power of ANOVA and the nose-hair pulling torment of factor analysis. Can you cope: I think not, mortal filth. Be warned, your brain will be placed in a jar of cerebral fluid and I will toy with it at my leisure." 2) It is all free. Including worksheets, data, etc. How amazing and generous. And, if you are feeling generous and feel the need to compensate him for the website, ...