I know it is January 2025, but I want to write about something that happened during the Spring of 2024.
I think it is a story about how it is never too late to do the right thing, making it great thing to think about here at the New Year. Data can't undo the past, but the way we manage it moving forward can provide the opportunity for some measure of equity.
Back in May, professional baseball decided to include Negro League (NL), which existed from 2910 to 1948, baseball stats as part of Major League Baseball (MLB) stats. This is was done to allow for proper recognition of talented ML players. This changed some storied records for the league:
https://www.mlb.com/news/stats-leaderboard-changes-negro-leagues-mlb |
This was a lot more than merging a couple of spreadsheets. As such, this story also serves as a lesson in data management and making desperate datasets the same. One that is a lot more moving than your typical story of data-cleaning. The following screenshots are from: https://www.mlb.com/news/mlb-negro-league-stats-added-after-statistical-review-committee-announces-findings
First, they had to find the data. Since the Negro League scores weren't authenticated at the time, data detectives have participated in "box-score archeology".
Then they needed to decide how to make different leagues, with different numbers of games, the same. As such, data expressed as a proportion, not a count, are where you see NL players shine.
For more in-depth reading, see:
NPR Coverage: https://www.npr.org/2024/05/29/g-s1-1525/mlb-negro-leagues-stats-josh-gibson
In-depth coverage of the decision-making process: https://www.mlb.com/news/mlb-negro-league-stats-added-after-statistical-review-committee-announces-findings
Comments
Post a Comment