Skip to main content

Posts

Showing posts matching the search for correlation

Sonnad and Collin's "10,000 words ranked according to their Trumpiness"

I finally have an example of Spearman's rank correlation to share. This is a political example, looking at how Twitter language usage differs in US counties based upon the proportion of votes that Trump received. This example was created by  Jack Grieves , a linguist who uses archival Twitter data to study how we speak. Previously, I blogged about his work that analyzed what kind of obscenities are used in different zip codes in the US . And he created maps of his findings, and the maps are color coded by the z-score for frequency of each word. So, z-score example. Southerners really like to say "damn". On Twitter, at least. But on to the Spearman's example. More recently, he conducted a similar analysis, this time looking for trends in word usage based on the proportion of votes Trump received in each county in the US. NOTE: The screen shots below don't do justice to the interactive graph. You can cursor over any dot to view the word as well as the cor...

In-house restaurant dining is related to increases in COVID-19 cases: Illustrates correlation, regression, and good science reporting

Niv Elis, writing for The Hill, summarized a report created by JP Morgan analyst Jesse Edgerton. The report found a link between in-restaurant spending from three weeks ago and increases in new cases of COVID-19 in different states now. Data for the analysis came from 1) J.P. Morgan/Chase in-restaurant (not online/takeout) credit card purchases and 2) infection data from Johns Hopkins.  How to use in class: 1. Correlation/regression: This graph, which summarizes the main findings from the report, may not include my beloved APA axis labels, but it does include an R2 and is a good example of a scatterplot.  ALSO: The author of The Hill piece was careful to include this information from the study's author, which clarifies that correlation doesn't necessarily equal causation. 2) Creativity in data analysis: Often, in intro psych stats, we use examples rooted in traditional social science research. We should use such an example. But we MUST also use examples that demonstrate how d...

"Correlation is not causation", Parts 1 and 2

Jethro Waters, Dan Peterson, Ph.D., Laurie McCollough, and Luke Norton made a pair of animated videos ( 1 , 2 ) that explain why correlation does not equal causation and how we can perform lab research in order to determine if causal relationships exist. I like them a bunch. Specific points worth liking: -Illustrations of scatter plots for significant and non-significant relationships. Data does not support the old wive's tale that everyone goes a little crazy during full moons. -Explains the Third Variable problem. Simple, pretty illustration of the perennial correlation example of ice cream sales (X):death by drowning (Y) relationship, and the third variable, hot weather (Z) that drives the relationship. -In addition to discussing correlation =/= causation, the video makes suggestions for studying a correlational relationship via more rigorous research methods (here violent video games:violent behavior). Video games (X) influence aggression (Y) via the moderato...

Jon Mueller's Correlation or Causation website

If you teach social psychology, you are probably familiar with Dr. Jon Mueller's Resources for the Teaching of Social Psychology website .  You may not be as familiar with Mueller's Correlation or Causation website, which keeps a running list of news stories that summarize research findings and either treat correlation appropriately or suggest/imply/state a causal relationship between correlational variables. The news stories run the gamut from research about human development to political psychology to research on cognitive ability. When I've used this website in the past, I have allowed my students to pick a story of interest and discuss whether or not the journalist in question implied correlation or causation. Mueller also provides several ideas (both from him and from other professors) on how to use his list of news stories in the classroom.

Correlation example using research study about reusable shopping bags/shopping habits

A few weeks ago, I used an NPR story in order to create an ANOVA example for use in class. This week, I'm giving the same treatment to a different research study discussed on NPR and turning it into a correlation example. A recent research study found that individuals who use reusable grocery store bags tend to spend more on both organic food AND junk food. Here is NPR's treatment of the research .  Here is a more detailed account of the research via an interview with one of the study's authors.   Here is the working paper that the PIs have released for even more detail.  The researchers frame their findings (folks who are "good" by using resuable bags and purchasing organic food then feel entitled to indulge in some chips and cookies) via "licensing", but I think this could also be explained by ego depletion (opening up a discussion about that topic). So, I created a little faux data set that replicates the main finding: Folks who use reusable ...

Correlation =/= causation, featuring positive psychology, hygge, and no math.

I have shared  AMPLE examples for teaching correlations . Because I've got you, boo. Like, I have shared days' worth of lecture material with you, my people. I am adding one more example. I have used this example in my positive psychology course for years, and it really illustrates what can happen en masse when marketing departments and less-savory pop-psych elements try to establish causal relationships with features (stereotypes?) of happy countries and individuals' subjective well-being. I like this one because it is math-free, UG-accessible, and not terribly long. Joe Pinsker, writing for the Atlantic, argues that... https://www.theatlantic.com/family/archive/2021/06/worlds-happiest-countries-denmark-finland-norway/619299/ TL;DR: Just because Northern European nations consistently score the highest on global happiness data doesn't mean that haphazardly adopting practices from those countries will make you happy. Correlation doesn't equal causation. H ere is the ...

Flowingdata's Car Costs vs. Emissions story

FlowingData shared some interesting info on how much cars cost version their environmental footprint .  TL;DR: Low emission cars tend to be cheaper in the long run. Hooray for the free market! The data is also available via the New York Times , along with a much more in-depth conversation about the actual cost of high/low emission cars, but it is behind a paywall. The original data, presented in a fun (nerd fun) interactive website , is available here.  How to use it in class: 1) It's a correlation! Each car model is a dot with two related data points: Average cost per month and average carbon dioxide emissions per mile.  2) It's Simpson's Paradox! Note how electric cars (yellow cloud all have similar emissions, but the average cost per month varies. Same for Diesel cars. Overall, you still see the positive correlation in the data, but if you break it down by class of car, the correlation isn't present for every level. 

Yau's "Divorce and Occupation"

Nathan Yau , writing for Flowing Data , provides a good example of correlation, median, and correlation not equaling causation in his story, " Divorce and Occupation ". Yau looked at the relationship between occupation and divorce in a few ways. He used one of variation upon the violin plot to illustrate how each occupation's divorce rate falls around the median divorce rate. Who has the lowest rate? Actuaries. They really do know how to mitigate risk. You could also discuss why median divorce rate is provided instead of mean divorce rate. Again, the actuaries deserve attention as they probably would throw off the mean. https://flowingdata.com/2017/07/25/divorce-and-occupation/ He also looked at  how salary was related to divorce, and this can be used as a good example of a linear relationship: The more money you make, the lower your chances for divorce. And an intuitive exception to that trend? Clergy members.  https://flowingdata.com/2017/07/25/divorce...

Johnson & Wilson's The 13 High-Paying Jobs You Don’t Want to Have

This is a lot of I/O and personality a little bit of stats. But it does demonstrate correlation and percentiles, and it is interactive. For this article  from Time, Johnson and Wilson used participant scores on a very popular vocational selection tool, the Holland Inventory (sometimes called the RAISEC), and participant salary information to see if there is a strong relationship between salary and personality-job fit. There is not. How to use in class: -Show your students what a weak correlation looks like when expressed via scatter plot. Seriously. I spend a lot of time looking for examples for teaching statistics. And there are all sorts of significant positive and negative correlation examples out there . But good examples of non-relationships are a lot rarer. -If you teach I/O, this fits nicely into personality-job fit lecture. If you don't teach I/O but are a psychologist, this still applies to your field and may introduce your students to the field of I/O. ...

YEET!, or why you should always check your scatter plot

 I sneak attack my students with this correlation example. I ask them to analyze this data as a correlation and create a report describing their data. This is what the data looks like: I'll be honest, I mostly do this for my own amusement. HOWEVER: It does demonstrate that scatter plots are helpful when making sure that a correlation analysis/scatter plot may contain a non-linear relationship (see: Datasaurus ). If you want to make your own silly scatter plot for data analysis, I recommend Robert Grant's DrawMyData website for doing so.

A bunch of pediatricians swallowed Lego heads. You can use their research to teach the basics of research methods and stats.

As a research-parent-nerd joke before Christmas, six doctors swallowed Lego heads and recorded how long it took to pass the Lego heads. Why? As to inform parents about the lack of danger associated with your kid swallowing a tiny toy.  I encourage you to use it as a class example because it is short, it describes its research methodology very clearly, using a within-subject design, has a couple of means, standard deviations, and even a correlation. TL;DR: https://dontforgetthebubbles.com/dont-forget-the-lego/ In greater detail: Note the use of a within subject design. They also operationalized their DV via the SHAT (Stool Hardness and Transit) scale. *Yeah. So here is the Bristol Stool Chart  mentioned in the above excerpt. Please don't click on the link if your are eating or have a sensitive stomach. Research outcomes, including mean and standard deviations: An example of a non-significant correlation, with the SHAT score on the y-axi...

Franz H. Messerli's "Chocolate consumption, cognitive function, and Nobel Laureates"

A chocolate study seems very appropriate for the day after Easter. Messerli's study found a strong and positive correlation between a nation's per capita chocolate consumption and the number of Nobel prizes won by that nation (see graph below). The research article is a pretty straight forward: The only statistical analysis conducted was a correlation, the journal article is very short, and it used archival data. As such, you can use this example to illustrate correlation and archival data as well as the dread "third variable" problem (by asking students to generate variables that may increase chocolate consumption as well as top-notch research/writing/peace/etc.). Property of Messerli/New England Journal of Medicine

Correlation example: Taco Bell and mortality by state...don't run for the border!

Many thanks to my colleague, Andrew Caswell, for sharing this Reddit post with me: https://www.reddit.com/r/dataisbeautiful/comments/s75sm7/oc_us_life_expectancy_vs_of_taco_bell_locations/ So, this alone is an excellent example of correlation and the third variable problem. But...more delightfully, the Redditor who created this graph also shared where he found this data (https://www.nicerx.com/fast-food-capitals/, https://worldpopulationreview.com/state-rankings/life-expectancy-by-state). BETTER STILL: I downloaded and organized all of the fast-food data and mortality data and put it in one spreadsheet for you all. Do All The Correlations! Teach your students about Bonferroni corrections! Figure out the fast-food restaurant that correlates the most strongly with mortality!   PS: Did you know that there is an option to download data from a website in Excel?  The fast-food data was presented in an embedded, scrolly table, and that Excel option made it easy-peasy to do...

Shapiro's "New Study Links Widening Income Gap With Life Expectancy"

This story is pretty easy to follow. Life expectancy varies by income level . The story becomes a good example for a statistics class because in the interview, the researcher describes a multivariate model. One in which multiple different independent variables (drug use, medical insurance, smoking, income, etc.) could be used to explain the disparity the exists in lifespan between people with different incomes. As such, this story could be used as an example of multivariate regression. And The Third Variable Problem. And why correlation isn't enough. In particular, this part of the interview (between interviewer Ari Shapiro and researcher Gary Burtless) refers to the underlying data as well as the Third Variable Problem as well as the amount to variability that can be assigned to the independent variables he lists). SHAPIRO: Why is this gap growing so quickly between life expectancy of rich and poor people? BURTLESS: We don't know. More affluent Americans tend to engage...

Scott Janish's "Relationship of ABV to Beer Scores"

Scott Janish loves beer, statistics, and blogging (a man after my own heart). His blog discusses home brewing as well as data related to beer. One of his statsy blog posts  looked at the relationship between average alcohol by volume for a beer style (below, on the x-axis) and the average rating (from beeradvocate.com , y-axis). He found, perhaps intuitively, a positive correlation between the average Beer Style review for a type of beer and the moderate alcohol content for that type of beer. Scott was kind enough to provide us with his data set, turning this into a most teachable moment. http://scottjanish.com/relationship-of-abv-to-beer-scores/ How to use it in class: 1) Scott provides his data. The r is .418, which isn't mighty impressive. However, you could teach your students about influential observations/outliers in regression/correlation by asking them to return to the original data, eliminate the 9 data points inconsistent with the larger pattern, and reanalyze th...

Quealy & Sanger-Katz's "Is Sushi ‘Healthy’? What About Granola? Where Americans and Nutritionists Disagree"

UPDATE, 9/22/22: Here is a non-paywalled link to this information:  https://www.nytimes.com/2017/10/09/learning/whats-going-on-in-this-graph-oct-10-2017.html This article from the NYT is based on a survey . That survey asked a bunch of nutritionists if they considered certain foods healthy. Then they asked a bunch of everyday folks if they considered the same foods to be healthy. Then they generated the percentage of each group that considered the food healthy. And the NYT put the nutritionist responses on a Y-axis, and commoners on the X, and made a lovely scatterplot... Nutritionists and non-nutritionists agree that chocolate chip cookies are not healthy. However, nutritionists are far more critical of American cheese than are non-nutritionists.  ...and provided us with the raw data as well.

Pew Research Center's "The strong relationship between per capita income and internet access, smartphone ownership"

This finding is super-duper intuitive: A positive, strong correlation exists between national per capita income and rates of internet access and smartphone ownership within that nation. Because it is intuitive, it makes a good example for your class when you teach correlation to your baby statisticians. This graph is  more engaging than your average graph because the good people at Pew made it interactive. You can see which country is represented by which dot. You can also see regional trends as the countries are color-coded by continent/region. For more context and information on this survey, see this more extensive report on the relationship between smartphone/internet access and economic advancement . This report further breaks down technology usage by education level, age, individual income, etc. This data is also useful for demonstrating the distribution of wealth in the world and variability that exists among countries in the same region/on the same continent,

Multiverse = multiple correlation and regression examples!

I love InformationIsBeautiful . They created my favorite data visualization of all tim e.  They also created an interactive scatterplot with all sorts of information about Marvel Comic Universe  films. How to use in class: 1. Experiment with the outcome variables you can add to the X and Y axes: Critical response, budget, box office receipts, year of release, etc. There are more than that; you can add them to either the X or Y axes. So, it is one website, but there are many ways to assess the various films. 2. Because of interactive axes, there are various correlation and regression examples. And these visualizations aren't just available as a quick visual example of linear relationships...see item 3... 3. You can ask your students to conduct the actual data analyses you can visualize because  the hecking data is available . 4. The website offers exciting analyses, encouraging your students to think critically about what the data tells them. 5. You could also squeeze Simp...

"Guess the Correlation" game

Found this gem, "Guess the Correlation" , via the subreddit r/statistics . The redditor who posted this resource (ow241) appears to be the creator of the website. Essentially, you view different scatter plots and try to guess r . Points are rewarded or taken away based on how close you are to true  r . The game tallies your average amount of error as well. It is way more addictive than it sounds. I think that accuracy increases with time and experience. True r for this one was .49. I guess .43, which isn't so bad. I think this is a good way for statistics instructors to procrastinate. I think it is also a good way to help your students build a more intuitive ability to read scatter plots and predict the strength of linear relationships.

Suicide hotline efficacy data: Assessment, descriptive data, t-tests, correlation, regression examples abound

ASIDE: THIS IS MY 500th POST. PLEASE CLAP. Efficacy data about a mental health intervention? Yes, please. The example has so much potential in a psych stats classroom. Or an abnormal/clinical classroom, or research methods. Maybe even human factors, because three numbers are easy to remember than 10? This post was inspired by an NPR story  by Rhitu Chatterjee. It is all about America's mental health emergency hotline's switch from a 10-digit phone number to the much easier-to-remember three digits (988), and the various ways that the government has measured the success of this change. How to use this (and related material) in class: 1) Assessment. In the NPR interview, the describe how several markers have improved: Wait times, dropped calls, etc.  Okay, so the NPR story sent me down a rabbit hole of looking for this data so we can use it in class. Here is the federal government's website about  988  and a link to their specific  988  performance data,...