Manually set shape by factor
Asked Answered
S

2

13

An example dataset:

A <- c('a','b', 'c','d','e')
types <- factor(A)
B <- c(1,2,3,4,5)
C <- c(6,7,8,9,10)
D <- c(1,2,1,2,3)
ABC <- data.frame(B,C,D,types)

library(ggplot2)

ggplot(ABC, aes(x=B ,y=C ,size=D, colour=as.factor(types),label=types, shape=as.factor(types))) +
geom_point()+geom_text(size=2, hjust=0,colour="black", vjust=0) +
scale_size_area(max_size=20, "D", breaks=c(100,500,1000,3000,5000))  +
scale_x_log10(lim=c(0.05,10),breaks=c(0.1,1,10))+ scale_y_continuous(lim=c(0,30000000)) +
scale_shape_manual(values=c(15,18,16,17,19))`

Plotting this you will there are factors a-e that have colours and shapes attributed to them.

In my code I use scale_shape_manual to set the shapes and they are defined by sequence i.e. the order of factors is a,b,c,d,e and my values are 15,18,16,17,19 so a=15 (a square), b=18 etc etc

I would like to set these shapes by factor. My data will be changing each day and the factors will be in different orders but I always want the same factors to have the same shapes.

So obviously this code doesn't work but something like:

scale_shape_manual(values=('a'=15, 'b'=18, 'c'=16, 'd'=17, 'e'=19))

Would be helpful if I could do the same for colour too.

Savage answered 6/10, 2014 at 14:3 Comment(0)
C
12

If I'm understanding you correctly, there will always be (at most) the five categories "a" - "e", and you want the shapes and colors for these to be consistent across datasets. Here is one way (note: gg_color_hue(...) is from here):

# set up shapes
shapes <- c(15,18,16,17,19)
names(shapes) <- letters[1:5]

# set up colors
gg_color_hue <- function(n) { # ggplot default colors
  hues = seq(15, 375, length=n+1)
  hcl(h=hues, l=65, c=100)[1:n]
}
colors <- gg_color_hue(5)
names(colors) <- names(shapes)

# original data
ggplot(ABC, aes(x=B ,y=C ,size=D, colour=types,label=types, shape=types)) +
  geom_point()+geom_text(size=2, hjust=0,colour="black", vjust=0) +
  scale_size_area(max_size=20, "D", breaks=c(100,500,1000,3000,5000))  +
  scale_x_log10(lim=c(0.05,10),breaks=c(0.1,1,10))+ 
  scale_y_continuous(lim=c(0,30000000)) +
  scale_shape_manual(values=shapes) + scale_color_manual(values=colors)

#new data
DEF <- data.frame(B,C,D,types=factor(c("a","a","a","d","e")))
ggplot(DEF, aes(x=B ,y=C ,size=D, colour=types,label=types, shape=types)) +
  geom_point()+geom_text(size=2, hjust=0,colour="black", vjust=0) +
  scale_size_area(max_size=20, "D", breaks=c(100,500,1000,3000,5000))  +
  scale_x_log10(lim=c(0.05,10),breaks=c(0.1,1,10))+ 
  scale_y_continuous(lim=c(0,30000000)) +
  scale_shape_manual(values=shapes) + scale_color_manual(values=colors)

Chiclayo answered 6/10, 2014 at 15:20 Comment(14)
Thanks, although no the number of factors will fluctuate between 17-19, this is why using the sequence way of doing it wasn't going to work as when one factor isn't included then whole sequence is thrown off.Savage
No it is not. In the second example, factors "b" and "c" are missing but the sequence is not thrown off; "a" is still square, "d" is still triangle, and "e" is still circle. Are you seriously going to use 19 shapes??Chiclayo
^I meant in my original attempt the seq was thrown off. No, I plan on using only the shape values 15,16 and 18. I want to set two of my factors as 16 specifically then the others can been 15,16 or 18. Using a few shapes as well as different colours should be enough variance for each factor.Savage
So does this answer your question??Chiclayo
I think so, let me quickly see if it works. Im not using a data frame, im reading from a csv and the factors are values in a column. I think it might work but let me check firstSavage
you use names(shapes) <- letters[1:5], instead of letters my actual data have names but when I tried to use the names I get Error: unexpected ',' in "names(shapes) <- name1,"Savage
It needs to be a character vector, something like c("name1","name2",...)Chiclayo
shapes <-c(15,18,16,16,15,18,15,16,18,16,15,16,18,15,16,18,15,18,16) names(shapes) <- c(data$name1, data$name2, ....) There were no shapes when I did this, blank plotSavage
Are there 19 columns, as is df$name1 through df$name19? If so, then names(shapes) <- paste0("name",1:19)Chiclayo
No theres a coloumn called names with 1236 rows. I tried length(data$name) but obviously that just gave me the length and the rest of the names were called NA. Of those 1236 rows there are only 19 unique names can I use as.factor(data$name)Savage
names(shapes) <- test$name Error in names(shapes) <- data$name : 'names' attribute [1249] must be the same length as the vector [19]Savage
names(shapes) <- sort(unique(data$name) worked Thanks a bunch :)Savage
The days where one of the names doesnt occur changes the names list and thus shifts all the shapes by 1. Is there a way to set the shapes at the start (and on the day the name isnt there to just ignore the fact its not there and thus keep the same order)??Savage
nvm, used data.frame and then write.table to add the name (with 0 for each other value) to the end of each csv thus adding it to the lists where it isn't and making my script work :DSavage
M
6

I'm certain this is no longer relevant for the OP but the best solution I found to this problem is simpler than what is currently posted and is almost written into the question itself.

The OP's wish of assigning a manualy defined shape or colour using something like
"scale_shape_manual(values=('a'=15, 'b'=18, 'c'=16, 'd'=17, 'e'=19))"
only requires the assignments to be passed as a vector as in,
scale_shape_manual(values = c('a'=15, 'b'=18, 'c'=16, 'd'=17, 'e'=19))

jlhoward's answer is better if you want autogenerated colours. Whereas the script I offer bellow requires fewer lines of code. Users choice.

A <- c('a','b', 'c','d','e')
types <- factor(A)
B <- c(1,2,3,4,5)
C <- c(6,7,8,9,10)
D <- c(1,2,1,2,3)
ABC <- data.frame(B,C,D,types)

library(ggplot2)

ggplot(ABC, aes(x=B ,y=C ,size=D, colour=as.factor(types),label=types, shape=as.factor(types))) +
geom_point()+geom_text(size=2, hjust=0,colour="black", vjust=0) +
scale_size_area(max_size=20, "D", breaks=c(100,500,1000,3000,5000))  +
scale_x_log10(lim=c(0.05,10),breaks=c(0.1,1,10))+
scale_y_continuous(lim=c(0,30000000)) +
scale_shape_manual(values = c('a'=15, 'b'=18, 'c'=16, 'd'=17, 'e'=19)) +
scale_colour_manual(values = c('a'="tomato", 'b'="yellow4", 'c'="palegreen2", 'd'="deepskyblue1", 'e'="orchid3"))`
Madelle answered 15/9, 2020 at 18:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.