Skip to main content

Data can be equity: Merging of Major League Baseball and Negro League Baseball data.

I know it is January 2025, but I want to write about something that happened during the Spring of 2024.

I think it is a story about how it is never too late to do the right thing, making it great thing to think about here at the New Year. Data can't undo the past, but the way we manage it moving forward can provide the opportunity for some measure of equity.

Back in May, professional baseball decided to include Negro League (NL), which existed from 2910 to 1948, baseball stats as part of Major League Baseball (MLB) stats. This is was done to allow for proper recognition of talented ML players. This changed some storied records for the league:

https://www.mlb.com/news/stats-leaderboard-changes-negro-leagues-mlb

This was a lot more than merging a couple of spreadsheets. As such, this story also serves as a lesson in data management and making desperate datasets the same. One that is a lot more moving than your typical story of data-cleaning. The following screenshots are from:  https://www.mlb.com/news/mlb-negro-league-stats-added-after-statistical-review-committee-announces-findings

First, they had to find the data. Since the Negro League scores weren't authenticated at the time, data detectives have participated in "box-score archeology". 



Then they needed to decide how to make different leagues, with different numbers of games, the same. As such, data expressed as a proportion, not a count, are where you see NL players shine.



Thanks COVID? It appears that part of the journey of merging the data occurred during COVID, with the league coped with how to integrate statistics collected during the irregular 2020 season.







For more in-depth reading, see:

Comments

Popular posts from this blog

Ways to use funny meme scales in your stats classes

Have you ever heard of the theory that there are multiple people worldwide thinking about the same novel thing at the same time? It is the multiple discovery hypothesis of invention . Like, multiple great minds around the world were working on calculus at the same time. Well, I think a bunch of super-duper psychology professors were all thinking about scale memes and pedagogy at the same time. Clearly, this is just as impressive as calculus. Who were some of these great minds? 1) Dr.  Molly Metz maintains a curated list of hilarious "How you doing?" scales.  2) Dr. Esther Lindenström posted about using these scales as student check-ins. 3) I was working on a blog post about using such scales to teach the basics of variables.  So, I decided to create a post about three ways to use these scales in your stats classes:  1) Teaching the basics of variables. 2) Nominal vs. ordinal scales.  3) Daily check-in with your students.  1. Teach your students the basics...

Using pulse rates to determine the scariest of scary movies

  The Science of Scare project, conducted by MoneySuperMarket.com, recorded heart rates in participants watching fifty horror movies to determine the scariest of scary movies. Below is a screenshot of the original variables and data for 12 of the 50 movies provided by MoneySuperMarket.com: https://www.moneysupermarket.com/broadband/features/science-of-scare/ https://www.moneysupermarket.com/broadband/features/science-of-scare/ Here is my version of the data in Excel format . It includes the original data plus four additional columns (so you can run more analyses on the data): -Year of Release -Rotten Tomato rating -Does this movie have a sequel (yes or no)? -Is this movie a sequel (yes or no)? Here are some ways you could use this in class: 1. Correlation : Rotten Tomato rating does not correlate with the overall scare score ( r = 0.13, p = 0.36).   2. Within-subject research design : Baseline, average, and maximum heart rates are reported for each film.   3. ...

Rouse, Russel, & Campbell (2025) is a curated list of Psi Chi journals that are perfect for Intro Stats.

This summer, the Psi Chi Journal of Psychology Research published  Rouse, Russel, and Campbell's Beyond the textbook: Psi Chi Journal articles in introductory psychology courses. It is a curated list of paywall-free Psi Chi articles, mostly with student co-authors, that are peer-reviewed and of an appropriate writing level and length to use in an Introduction to Psychology course. The authors provide the following information for each of the articles: In addition to being appropriate for Into Psych, these articles are also perfect for Intro Stats. In my classes, I emphasize the ability to read and write simple result sections. One way I would review this skill is by showing my students Results sections from published research and asking them to identify the test statistics, effect size, and other relevant information. This selection of articles features clear and concise results sections for t -tests, ANOVA, factorial ANOVA, regression, and correlation. I created a spreadsheet...