Compiled Sep 06, 2020

 
 
Current draft aims to introduce researchers to visualizing both data and statistics in R with the GGplot package.
Our target audience is primarily the research community at VUB / UZ Brussel, those who have some basic experience in R and want to know more.
 
We invite you to help improve this document by sending us feedback
wilfried.cools@vub.be or anonymously at icds.be/consulting (right side, bottom)  
 

Key message on data visualization

 
Data visualization is inherent to data analysis, not just a way of communicating the results.

 
Data visualization is best done with coding (as opposed to manual changes).

 
Data visualization is easier and more intuitive when maintaining tidy data.

 
 

workflow: tidyverse lingo

workflow: tidyverse lingo

R’s tidyverse package: ggplot2

 
Focus in current draft is on R.

 

ggplot2 of the tidyverse package (Hadley Wickham etal.).

 
Install (at least once) and load (once per R session) the tidyverse package, or the ggplot2 package.

install.packages('tidyverse')
library(tidyverse)

 
Find a convenient cheat sheet on data visualization at https://rstudio.com/resources/cheatsheets/

 
 

Examples to get started

Highlighting both base R and ggplot to get a first impression.

To use the build-in iris data, include it with data( ).

data(iris)

Have a peak at it’s contents with str( ) and head( ).

 

base R

Various visualizations of the data are possible, also in base R.

For example, consider the scatterplot, boxplot, histogram and dotplot.

par(mfrow=c(2,2))
plot(iris$Petal.Length,iris$Petal.Width,col=iris$Species,main='scatterplot')
boxplot(iris$Sepal.Length~iris$Species,main='boxplot')
hist(iris$Sepal.Width,main='histogram')
dotchart(iris$Sepal.Width,main='dotplot')

par(mfrow=c(1,1))
  • plots generated in a single function call, with various parameters
  • further fine-tuning, with legends, annotations, and much more, possible; check the helpfile on ?par and ?options
boxplot(iris$Sepal.Length~iris$Species,main='boxplot',
    horizontal=TRUE, las=2, cex.axis=.75, ylab='',xlab='length',col=c(4,2,3)); 

 

ggplot2

Visualization, especially when slightly more complex, is often more intuitive with ggplot2.

For example, consider the scatterplot, boxplot, and histogram.

p1 <- ggplot(data=iris,aes(y=Petal.Width,x=Petal.Length,col=Species)) + geom_point()
p2 <- ggplot(data=iris,aes(y=Sepal.Length,col=Species)) + geom_boxplot()
p3 <- ggplot(data=iris,aes(x=Sepal.Width)) + geom_histogram()
grid.arrange(p1, p2, p3, ncol=3)

  • plots generated by creating an R object, and adding layers to it for visualization
  • further fine-tuning, with legends, annotations, and much more, done with layers too

 

To save the last generated plot, the ggsave( ) function is available.

ggsave('plotname.png',width=12,height=6)

conclusion

Both types, and other types of visualization will do what they are supposed to.
Because ggplot is taking over, and because it is so much fun to work with, ggplot is presented.

 
 

-gg- : grammar of graphics

GGplot philosophy: Grammar of Graphics (Leland Wilkinson)

General structure includes data, functions and arguments.

Grammer of Graphics sparked further developments

 
 

Visualization essentials

First part addresses how to make a visualization,
afterwards it is considered in less detail how to further make refinements.

 
### step by step example

Use is made of the build-in mtcars and already loaded build-in iris dataset.

data(mtcars)
mtcars %>% head()
|                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
| Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
| Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
| Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
| Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
| Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
| Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

The ggplot object is constructed.

p1 <- ggplot(data=mtcars, aes(y=mpg,x=disp))

No visualization is made yet, only the object is created.

A layer is added with the + sign, adding a geometric function.

p2 <- ggplot(data=mtcars, aes(y=mpg,x=disp)) + geom_point()
grid.arrange(p1, p2, ncol=2)

Non-essential aesthetics can be included, for example color.

p1 <- ggplot(data=mtcars, aes(y=mpg,x=disp,color=gear)) + geom_point()
p2 <- ggplot(data=mtcars, aes(y=mpg,x=disp,color=factor(gear))) + geom_point()
grid.arrange(p1, p2, ncol=2)

ggplot(data=mtcars, aes(y=mpg,x=disp,color=gear)) + geom_point(aes(color=factor(gear)))
ggplot(data=mtcars, aes(y=mpg,x=disp,color=gear)) + geom_point(color='#FF6600')
p1 <- ggplot(data=mtcars, aes(y=mpg,x=disp)) + geom_point(aes(color=factor(gear)),alpha=.3)
p2 <- ggplot(data=mtcars, aes(y=mpg,x=disp)) + geom_point(aes(color=factor(gear),alpha=.3))
p3 <- ggplot(data=mtcars, aes(y=mpg,x=disp)) + 
    geom_point(aes(color=factor(gear),alpha=qsec/max(qsec)))
grid.arrange(p1, p2, p3, ncol=3)