So, a linguist, Dr. Jack Grieve decided to use Twitter data to map out use of different obscenities by county of the United States. Gawker picked up on this research and created a story about it. How can this be used in a statistics class? In order to quantify greater or lessor use of different obscenities, he created z-scores by county and illustrated the difference via a color coding system. The more orange, the higher the z-score for a region (thus, greater usage) while blue indicates lesser usage. And, there are three such maps (damn, darn, and gosh) that are safe for use in class:
|Fine Southern tradition of, frankly, giving a damn.|
|Northern Midwest prefers "Darn"...|
|...while Tornado Alley likes "Gosh". And New Jersey/NYC/Long Island/Boston doesn't like any of this half-assed swearing.|
How to use in class? Z-scores, use of archival Twitter data. You can also discuss how mode of data collection effects outcomes. They collected data via Twitter. Is this a representative sample? Nope! Does the data reflect on the way that people speak or the way that people self-present?