How to use ggplot to plot T-SNE clustering
Asked Answered
B

1

6

Here is the t-SNE code using IRIS data:

library(Rtsne)
iris_unique <- unique(iris) # Remove duplicates
iris_matrix <- as.matrix(iris_unique[,1:4])
set.seed(42) # Set a seed if you want reproducible results
tsne_out <- Rtsne(iris_matrix) # Run TSNE


# Show the objects in the 2D tsne representation
plot(tsne_out$Y,col=iris_unique$Species)

Which produces this plot:

enter image description here

How can I use GGPLOT to make that figure?

Breckenridge answered 30/6, 2017 at 2:5 Comment(0)
S
16

I think the easiest/cleanest ggplot way would be to store all the info you need in a data.frame and then plot it. From your code pasted above, this should work:

library(ggplot2)
tsne_plot <- data.frame(x = tsne_out$Y[,1], y = tsne_out$Y[,2], col = iris_unique$Species)
ggplot(tsne_plot) + geom_point(aes(x=x, y=y, color=col))

enter image description here

My plot using the regular plot function is:

plot(tsne_out$Y,col=iris_unique$Species)

enter image description here

Skeens answered 30/6, 2017 at 2:23 Comment(2)
@Mike.H thanks. But it's a bit strange the configuration of your plot is different with my OP, given the seed(42). For example y-axis in yours is up to ~5 where as mine ~10.Breckenridge
t-SNE is a stochastic algorithm, so every time you run it, the values of the two axes will differ, as well as the shapes of the clusters. I was trying to reproduce a plot for a poster with a narrow aspect ratio, so I found it useful to set.seed(...) before running each instance to make sure it was repeatable.Acoustician

© 2022 - 2024 — McMap. All rights reserved.