A dataset demonstrating the utility of visualization. These 12 datasets are equal in standard measures: mean, standard deviation, and Pearson's correlation.
Format
A data frame with 142 rows and 26 variables:
away_x: x-values for the
away
datasetaway_y: y-values for the
away
datasetbullseye_x: x-values for the
bullseye
datasetbullseye_y: y-values for the
bullseye
datasetcircle_x: x-values for the
circle
datasetcircle_y: y-values for the
circle
datasetdino_x: x-values for
dinosaur
dataset!dino_y: y-values for
dinosaur
dataset!dots_x: x-values for the
dots
datasetdots_y: y-values for the
dots
dataseth_lines_x: x-values for the
h_lines
dataseth_lines_y: y-values for the
h_lines
datasethigh_lines_x: x-values for the
high_lines
datasethigh_lines_y: y-values for the
high_lines
datasetslant_down_x: x-values for the
slant_down
datasetslant_down_y: y-values for the
slant_down
datasetslant_up_x: x-values for the
slant_up
datasetslant_up_y: y-values for the
slant_up
datasetstar_x: x-values for the
star
datasetstar_y: y-values for the
star
datasetv_lines_x: x-values for the
v_lines
datasetv_lines_y: y-values for the
v_lines
datasetwide_lines_x: x-values for the
wide_lines
datasetwide_lines_y: y-values for the
wide_lines
datasetx_shape_x: x-values for the
x_shape
datasetx_shape_y: y-values for the
x_shape
dataset
References
Matejka, J., & Fitzmaurice, G. (2017). Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing. CHI 2017 Conference proceedings: ACM SIGCHI Conference on Human Factors in Computing Systems. Retrieved from https://www.autodeskresearch.com/publications/samestats.
Examples
# Save current settings
state <- par("mar", "mfrow")
# Base R Plots
par(mfrow = c(5, 3), mar=c(1, 3, 3, 1))
nms <- names(datasaurus_dozen_wide)
for (i in seq(1, 25, by = 2)){
nm <- substr(nms[i], 1, nchar(nms[i]) - 2)
plot(datasaurus_dozen_wide[[nms[i]]],
datasaurus_dozen_wide[[nms[i+1]]],
xlab = "", ylab = "", main = nm)
}
#reset settings
par(state)