R: add calibrated axes to PCA biplot in ggplot2
Asked Answered
A

2

19

I am working on an ordination package using ggplot2. Right now I am constructing biplots in the traditional way, with loadings being represented with arrows. I would also be interested though to use calibrated axes and represent the loading axes as lines through the origin, and with loading labels being shown outside the plot region. In base R this is implemented in

library(OpenRepGrid)
biplot2d(boeker)

enter image description here

but I am looking for a ggplot2 solution. Would anybody have any thoughts how to achieve something like this in ggplot2? Adding the variable names outside the plot region could be done like here I suppose, but how could the line segments outside the plot region be plotted?

Currently what I have is

install.packages("devtools")
library(devtools)
install_github("fawda123/ggord")
library(ggord)
data(iris)
ord <- prcomp(iris[,1:4],scale=TRUE)
ggord(ord, iris$Species)

enter image description here

The loadings are in ord$rotation

                    PC1         PC2        PC3        PC4
Sepal.Length  0.5210659 -0.37741762  0.7195664  0.2612863
Sepal.Width  -0.2693474 -0.92329566 -0.2443818 -0.1235096
Petal.Length  0.5804131 -0.02449161 -0.1421264 -0.8014492
Petal.Width   0.5648565 -0.06694199 -0.6342727  0.5235971

How could I add the lines through the origin, the outside ticks and the labels outside the axis region (plossibly including the cool jittering that is applied above for overlapping labels)?

NB I do not want to turn off clipping, since some of my plot elements could sometimes go outside the bounding box

EDIT: Someone else apparently asked a similar question before, though the question is still without an answer. It points out that to do something like this in base R (though in an ugly way) one can do e.g.

plot(-1:1, -1:1, asp = 1, type = "n", xaxt = "n", yaxt = "n", xlab = "", ylab = "")
abline(a = 0, b = -0.75)
abline(a = 0, b = 0.25)
abline(a = 0, b = 2)
mtext("V1", side = 4, at = -0.75*par("usr")[2])
mtext("V2", side = 2, at = 0.25*par("usr")[1])
mtext("V3", side = 3, at = par("usr")[4]/2)

enter image description here

Minimal workable example in ggplot2 would be

library(ggplot2)
df <- data.frame(x = -1:1, y = -1:1)
dfLabs <- data.frame(x = c(1, -1, 1/2), y = c(-0.75, -0.25, 1), labels = paste0("V", 1:3))
p <- ggplot(data = df, aes(x = x, y = y)) +  geom_blank() +
  geom_abline(intercept = rep(0, 3), slope = c(-0.75, 0.25, 2)) +
  theme_bw() + coord_cartesian(xlim = c(-1, 1), ylim = c(-1, 1)) +
  theme(axis.title = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(),
        panel.grid = element_blank())
p + geom_text(data = dfLabs, mapping = aes(label = labels))

enter image description here

but as you can see no luck with the labels, and I am looking for a solution that does not require one to turn off clipping.

EDIT2: bit of a related question is how I could add custom breaks/tick marks and labels, say in red, at the top of the X axis and right of the Y axis, to show the coordinate system of the factor loadings? (in case I would scale it relative to the factor scores to make the arrows clearer, typically combined with a unit circle)

Anglonorman answered 12/8, 2015 at 14:3 Comment(9)
Maybe use geom_path() to get your lines? and then also use custom axes? See here for tips on that. However, Hadley doesn't like multiple plots on top of each other so you might not able to make what you want easily (see Hadley's answer here).Pythoness
Yes for the lines I think I can manage (although I would run into problems I think if I would let the lines run outside the plot region, as I don't want to disable clipping in my case). For the labels would you suggest to place them at specific breaks then, but simply use the standard tick marks for them? (not entirely what I want, as they should ideally be oblique) And how can I get different breaks and labels on all 4 sides of the graph?Anglonorman
And yes re. Hadleys insistance of not wanting to support different scales: that gets me into trouble for biplots, as I still don't manage to add specific breaks and labels in a different colour at the top of the Y and the right of the X axis - if you would happen to know how to do that let me know; bit of a related problem....Anglonorman
I'm not sure how to do the labels. As a hack, you might put the names inside next to the lines with geom_text, but that doesn't seem to do what you want. The question is if it is good enough?Pythoness
Yeah on the inside is not good enough for me I'm afraid - think I'll have to wait for a reply of one of the gurus like baptiste...Anglonorman
this might helpArchimandrite
Thanks for the link, hadn't seen that! You wouldn's have an example by any chance where you add labels on all 4 sides of the graph that you could post as an answer?Anglonorman
not really, that link would be my starting point. I think for the problem at hand, it might be easier to remove the plot panel background and axes, and add a smaller rectangle within the plot area below the data layers to mimic the appearance of a plot panel.Archimandrite
For base R the calibrate package is very useful. It might not take that much work to get the data from the calibrate function and use it in ggplot2. cran.r-project.org/web/packages/calibrate/vignettes/…Kofu
A
6

Maybe as an alternative, you could remove the default panel box and axes altogether, and draw a smaller rectangle in the plot region instead. Clipping the lines not to clash with the text labels is a bit tricky, but this might work.

enter image description here

df <- data.frame(x = -1:1, y = -1:1)
dfLabs <- data.frame(x = c(1, -1, 1/2), y = c(-0.75, -0.25, 1), 
                     labels = paste0("V", 1:3))
p <- ggplot(data = df, aes(x = x, y = y)) +  
  geom_blank() +
  geom_blank(data=dfLabs, aes(x = x, y = y)) +
  geom_text(data = dfLabs, mapping = aes(label = labels)) +
  geom_abline(intercept = rep(0, 3), slope = c(-0.75, 0.25, 2)) +
  theme_grey() +
  theme(axis.title = element_blank(), 
        axis.text = element_blank(), 
        axis.ticks = element_blank(),
        panel.grid = element_blank()) + 
  theme()

library(grid)
element_grob.element_custom <- function(element, ...)  {
  rectGrob(0.5,0.5, 0.8, 0.8, gp=gpar(fill="grey95"))
}

panel_custom <- function(...){ # dummy wrapper
  structure(
    list(...), 
    class = c("element_custom","element_blank", "element") 
  ) 

}

p <- p + theme(panel.background=panel_custom())


clip_layer <- function(g, layer="segment", width=1, height=1){
  id <- grep(layer, names(g$grobs[[4]][["children"]]))
  newvp <- viewport(width=unit(width, "npc"), 
                    height=unit(height, "npc"), clip=TRUE)
  g$grobs[[4]][["children"]][[id]][["vp"]] <- newvp

  g
}

g <- ggplotGrob(p)
g <- clip_layer(g, "segment", 0.85, 0.85)
grid.newpage()
grid.draw(g)
Archimandrite answered 16/8, 2015 at 9:49 Comment(2)
Many thanks for that! Will definitely be useful, so +1! Still a bit undecided though which of the two options here to go for, so I'll still wait a bit to see what people come up with! Dealing with overlapping labels might also still be a little tricky...Anglonorman
there are a few R functions out there to try and avoid clashes (force-directed layout, directlabels package, etc.), and it's probably easier to use with the labels inside the (invisible here) plot panel, because the available space is more practical than in the margins.Archimandrite
I
1

What about this:

enter image description here

use the following code. If you want the labels also on top and on the right have a look at: http://rpubs.com/kohske/dual_axis_in_ggplot2

require(ggplot2)

data(iris)
ord <- prcomp(iris[,1:4],scale=TRUE)

slope <- ord$rotation[,2]/ord$rotation[,1]

p <- ggplot() + 
  geom_point(data = as.data.frame(ord$x), aes(x = PC1, y = PC2)) +
  geom_abline(data = as.data.frame(slope), aes(slope=slope))

info <- ggplot_build(p)

x <- info$panel$ranges[[1]]$x.range[1]
y <- info$panel$ranges[[1]]$y.range[1]

p + 
  scale_x_continuous(breaks=y/slope, labels=names(slope)) +
  scale_y_continuous(breaks=x*slope, labels=names(slope)) +
  theme(axis.text.x  = element_text(angle=90, vjust=0.5),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.title.x=element_blank(),
        axis.title.y=element_blank()) 
Iow answered 16/8, 2015 at 12:21 Comment(3)
Many thanks for that - definitely a very elegant solution - +1! And feel free to also add an example with labels on all 4 side of the graph! (Note that labels would have to be placed on the side in which the arrow is pointing - I think right now that's not what's done here, think X and Y axis is also flipped a round - but these are just details that should be easy to fix)Anglonorman
Just Fliped the coordinates. Ofc you are right, that x and y axis were wrong. You are also right that (in this example) all 3 y axis labels have to go on the right side. But i leave the ggplot axis tweaking to those who like to hack ggplotIow
I ran the above code getting this error "info <- ggplot_build(p) Error: geom_abline requires the following missing aesthetics: intercept"Ancipital

© 2022 - 2024 — McMap. All rights reserved.