Anscombe's quartet comprises four datasets that have nearly identical simple descriptive statistics, yet appear very different when graphed. Each dataset consists of eleven (x,y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers on statistical properties. He described the article as being intended to attack the impression among statisticians that "numerical calculations are exact, but graphs are rough."[1]
Source: https://en.wikipedia.org/wiki/Anscombe%27s_quartet
Source: The Datasaurus Dozen
Source: https://en.wikipedia.org/wiki/Anscombe%27s_quartet
(You can easily check this in R by
loading the data with
data(anscombe)
.) But what you
might not realize is that it's possible to generate bivariate data with a given
mean, median, and correlation in any
shape you like — even a dinosaur:Source: The Datasaurus Dozen
Posted: 02 May 2017 08:16 AM PDT
(This article was first
published on Revolutions,
and kindly contributed to R-bloggers)
No comments:
Post a Comment