Skip to main content

Posts

Showing posts with the label regression

xkcd comics and statistical thinking.

Xkcd is a gift to Statisitcs instructors . Author Randall Monroe shares his humor and statistics knowledge. I think that many of his comics can be used as extra credit points , in that you don't get the joke unless you get the conceptual statistical knowledge behind the joke. NOTE: I have included images here, but you really, really should go to the original comics and cursor over for the messages to view the alternative text. NOTE TWO: This is not a comprehensive list but I will try to update it as Monroe shares more comics. To teach APA formatting: https://xkcd.com/833/ To explain sufficient sample size in research: https://xkcd.com/507/ To explain good statistics manners/how to appropriately ask for stats help: https://m.xkcd.com/2116/ To explain error bars: https://xkcd.com/2110/ T-test and the t-curve: https://xkcd.com/2110/ Linear relationships: https://xkcd.com/605/ The Normal Curve: https://xkcd.com/2118/ Cherry picking, p-...

BBC's News' "Who is your Olympic Body Match?"

This interactive website from the BBC will match your student, using their height, gender, and weight, to their Rio Olympic body match. You enter your height, weight, age, and select your gender. It matches you with the athlete who is the most like you. It also provides good examples for distribution, and where you fall on the distribution, for Olympic athletes. I think it also gets students thinking about regression models. After you enter your data, the page returns information about where you fall on the distribution histogram for Olympic athletes by height, weight, and age for your gender. Then, the website returns your topic matches: How to use in class: 1) What other IVs could you collect to determine best sport match (DV)? Family income (I had access to soccer growing up, but not dressage horses)? Average temperature of hometown (My high school had a skiing club but not a beach volleyball club)? This gets your students thinking about multiple regression ...

Two great websites that generate data sets for teaching.

You could also use these websites to generate totally unethical data for publication. Don't do it, buddy. Sometimes, it is lovely to have some data generated to teach your stats class when you are teaching. You know the data for a particular statistical test and the results. Here are two websites that do just that. One tried and trustworthy resource was created by   I/O psychologist Richard Landers.  I  blogged about this one  in 2013, and I've used his data generator for years. My new resource is from social psychologist Andrew Luttrell. Nice things about both: -Data! -Both are easy to use. -Specific data for everything you teach in Intro Stats, like t-tests, ANOVA, correlation, and regression. -They are both free and help you do your job. Thanks, Richard and Andrew! The nice thing about Richard's is that it gives you options of several different units (days, money, etc.) AND vignettes that explain why this data was collected. You can generate data ...

Wilke's regression line CIs via GIFs

A tweet straight up solved a problem I encountered while teaching. The problem: How can I explain why the confidence interval area for a regression line is curved when the regression line is straight. This comes up when I use my favorite regression example.  It explains regression AND the power that government funding has over academic research . TL:DR- Relative to the number of Americans who die by gun violence, there is a disproportionately low amount of a) federal funding and b) research publications as to  better understand gun violence death when compared to funding and publishing about other common causes of death in America. Why? Dickey Amendment to a 1996 federal spending bill. See graph below: https://jamanetwork.com/journals/jama/article-abstract/2595514 The gray area here is the confidence interval region for the regression line. And I had a hard time explaining to my students why the regression line, which is straight, doesn't have a perfectly rectangula...

Mathisfun.com's Least Squared Error calculator

Mathisfun.com bills this as a Least Squared Error calculator , but I don't think it is a calculator. I think it is a nice visual aid that demonstrates how the regression line/equation change as your data changes. The static photo below doesn't do this interactive website justice. You can drag and drop any of the dots on the scatter plot and watch as the regression line and regression line equation are recalculated to best predict Y based on X. It doesn't explicitly show the math going on behind the scenes, but it is a nice compliment to your LSE lecture. https://www.mathsisfun.com/data/least-squares-calculator.html

Dozen of interactive stats demos from @artofstat

This website is associated with Agresti, Franklin, and Klinenberg's text Statistics, The Art and Science of Learning from Data ( @artofstat ), and there are dozens of great interactives to share with your statistics students. Similar and useful interactives exist elsewhere, but it is nice to have such a thorough, one-stop-shop of great visuals. Below, I have included screengrabs of two of their interactive tools. They also explain chi-square distributions, central limit theorem, exploratory data analysis, multivariate relationships, etc. This interactive about linear regression let's you put in your own dots in the scatter plot, and returns descriptive data and the regression line, https://istats.shinyapps.io/ExploreLinReg/.  Show the difference between two populations (of your own creation), https://istats.shinyapps.io/2sample_mean/

Climate Central's The First Frost is Coming Later

So, this checks off a couple of my favorite requisites for a good teaching example: You can personalize it, it is contemporary and applicable, it illustrates a few different sorts of statistics.  Climate Central wrote this article about first frost dates, and how those dates, and an increasing number of frost-free days, create longer growing seasons.  The overall article is about how frosty the US is becoming as the Earth warms. They provide data about the first frost in a number of US cities. It even lists my childhood hometown of Altoona, PA, so I think there is a pretty large selection of cities to choose from. Below, I've included the screen grab for my current home, and the home of Gannon University, Erie, PA. The first frost date is illustrated with a line chart, but the chart also includes the regression line. Data for frosty, chilly Erie, PA The article also presents a chart that shows how frost is related to the length of the growing season in t...

Johnson's "The reasons we don’t study gun violence the same way we study infections"

This article from The Washington Post summarizes research published in the Journal of the American Medical Association . Both are simple, short articles that show how you can use regression to make an argument. Here, the authors use regression to demonstrate the paucity of funding and publications for research studying gun-related deaths. A regression line was generated to predict how much money was spent studying common causes of death in the US. Visually, we can see that deaths by firearms aren't receiving funding proportional to the number of deaths they cause. See the graph below. How to use in class: 1) How is funding meted out by our government to better understand the problems that plague our country? Well, it isn't being given to researchers studying gun violence because of the Dickey Amendment . I grew up in a very hunting friendly/gun-friendly part of Pennsylvania. I've been to the shooting range. And it upsets me that we can't better understand and stu...

Annenberg Learner's "Against All Odds"

Holy smokes. How am I just learning about this amazing resource (thanks, Amy Hogan, for the lead) now? The folks over at Annenberg, famous for Zimbardo's Discovering Psychology series, also have an amazing video collection about statistics, called "Against All Odds" . Each video couches a statistical lesson in a story. 1) In addition to the videos , there are student and faculty guides to go along with every video/chapter. I think that using these guides, and instructor could go textbook free. 2) The topics listed approximate an Introduction to Statistics course. https://www.learner.org/courses/againstallodds/guides/faculty.html

xkcd's Linear Regression

http://xkcd.com/1725/ This comic is another great example of allowing your student to demonstrate statistical comprehension by explaining why a comic is funny. What does the r^2 indicate? When would it be easy to guess the direction of the correlation?  More on that via this previous blog post .

Anscombe's Quartet

No, Wikipedia isn't a proper resource for our students to cite. But it is not without merit. For example, I think the information it provides on Anscombe's quartet is very useful. This example provides four data distributions. For each, the means and variances for both the X and Y variables are identical. The correlations between X and Y, and the regression lines, are also identical. This is the descriptive/inferential data that applies to each of the four graphs I have seen variations upon this in textbooks over the years, but typically they just show how different distributions can have the same mean and standard deviation. I think this example goes the extra mile by including r and the regression line. How to use in class: -Graphs aren't for babies. They can be an essential part of understanding your data. -Outliers are bad! -The original data is also included at the Wikipedia entry if you would like your students to create these graphs in class.

Shapiro's "New Study Links Widening Income Gap With Life Expectancy"

This story is pretty easy to follow. Life expectancy varies by income level . The story becomes a good example for a statistics class because in the interview, the researcher describes a multivariate model. One in which multiple different independent variables (drug use, medical insurance, smoking, income, etc.) could be used to explain the disparity the exists in lifespan between people with different incomes. As such, this story could be used as an example of multivariate regression. And The Third Variable Problem. And why correlation isn't enough. In particular, this part of the interview (between interviewer Ari Shapiro and researcher Gary Burtless) refers to the underlying data as well as the Third Variable Problem as well as the amount to variability that can be assigned to the independent variables he lists). SHAPIRO: Why is this gap growing so quickly between life expectancy of rich and poor people? BURTLESS: We don't know. More affluent Americans tend to engage...

Scott Janish's "Relationship of ABV to Beer Scores"

Scott Janish loves beer, statistics, and blogging (a man after my own heart). His blog discusses home brewing as well as data related to beer. One of his statsy blog posts  looked at the relationship between average alcohol by volume for a beer style (below, on the x-axis) and the average rating (from beeradvocate.com , y-axis). He found, perhaps intuitively, a positive correlation between the average Beer Style review for a type of beer and the moderate alcohol content for that type of beer. Scott was kind enough to provide us with his data set, turning this into a most teachable moment. http://scottjanish.com/relationship-of-abv-to-beer-scores/ How to use it in class: 1) Scott provides his data. The r is .418, which isn't mighty impressive. However, you could teach your students about influential observations/outliers in regression/correlation by asking them to return to the original data, eliminate the 9 data points inconsistent with the larger pattern, and reanalyze th...

UCLA's "What statistical analysis should I use?"

This resource from UCLA is , essentially, a decision making tree for determining what kind of statistical analysis is appropriate based upon your data (see below). Screen shot from "What statistical analysis should I use?" Now, such decision making trees are available in many statistics text book...however... what makes this special is the fact that with each test comes code/syntax as well as output for SAS, Stata, SPSS, and R. Which is helpful to our students (and, let's be honest, us instructors/researchers as well).

MathIsFun.com's linear equation Flash applet

When I teach regression, I usually introduce the regression line by reminding my students of the long-ago days of algebra class and graph paper and rulers. MathIsFun.com has created an interactive applet that mimics the graph paper and allows users to adjust the y-intercept and the slope . This is a slightly fancier, more high-tech way to get your students thinking about the linear equation and then fitting that old knowledge into the new concept of regression. Use the bars to adjust slope and y-intercept as a quick linear equation primer before teaching regression

Priceonomic's Hipster Music Index

This tongue-in-cheek  regression analysis found a way to predict the "Hipster Music Index" of a given artist by plotting # of Facebook shares of said artist's Pitchfork magazine review on they y-axis and Pitchfork magazine review score on the x-axis. If an artist falls above the linear regression line, they aren't "hipster". If they fall below the line, they are. For example, Kanye West is a Pitchfork darling but also widely shared on FB, and, thus demonstrating too much popular appeal to be a hipster darling (as opposed to Sun Kill Moon (?), who is beloved by both Pitchfork but not overly shared on FB). As instructors, we typically talk about the regression line as an equation for prediction, but Priconomics uses the line in a slightly different way in order to make predictions. Also, if you go to the source article, there are tables displaying the difference between the predicted Y-value (FB Likes) for a given artist versus the actual Y-value, which coul...

Kevin Wu's Graph TV

UPDATE! This website is not currently available.  Kevin Wu's Graph TV  uses individual episode ratings (archival data via IMDB ) of TV shows, graphs each episode over the course of a series via scatter plot, and generates a regression line. This demonstrates fun with archival data as well as regression lines and scatter plots. You could also discuss sampling, in that these ratings were provided by IMDB users and, presumably, big fans of the shows (and whether or not this constitutes representative sampling). The saddest little purple dot is the episode Black Market. Truth!

Northwestern Mutual's "The Longevity Game"

I guess "The Longevity Game" sounds better than The Death Calculator. Which is what Northwestern Mutual has created and shared with us. Essentially, you answer questions about yourself (weight, exercise, stress management, driving habits, drug and alcohol habits, etc.) and the Game will give you an estimation for how long you should live based on the data you provide. The Longevity Game, from Northwestern Mutual I use this in class to demonstrate how data and statistics influence certain aspects of our lives (like whether or not an insurer is willing to provide us with insurance coverage). This can also be used to introduce multiple regression, since multiple factors are taken into account when predicting the outcome measure of life expectancy. I also make sure to emphasize to my students that this calculator was created by an insurance company that was founded in 1857 and that this calculator isn't just some random interwebz quiz. Warning: I wouldn't ask...

Andy Field's Statistics Hell

Andy Field is a psychologist, statistician, and author. He created a funny, Dante's Inferno-themed  web site that contains everything you ever wanted to know about statistics. I know, I know, you're thinking, "Not another Dante's Inferno themed statistics web site!". But give this one a try. Property of Andy Field. I certainly can't take credit for this. Some highlights: 1) The aesthetic is priceless. For example, his intermediate statistics page begins with the introduction, "You will experience the bowel-evacuating effect of multiple regression, the bone-splintering power of ANOVA and the nose-hair pulling torment of factor analysis. Can you cope: I think not, mortal filth. Be warned, your brain will be placed in a jar of cerebral fluid and I will toy with it at my leisure." 2) It is all free. Including worksheets, data, etc. How amazing and generous. And, if you are feeling generous and feel the need to compensate him for the website, ...

Shameless self-promotion

Here is a publication  from Teaching of Psychology in which I outline not one, not two, not three, but FOUR free/cheap internet based activities to be used in statistics/research methods classes. (If you have access to ToP publications, you can also get it here .)