Stacked barplot in UpSetR
Asked Answered
T

3

8

I have been looking for a way of having a stacked bar plot in an upsetR graph. I downloaded the movies data set (from here) and added a column having only two values "M" and "C". Below, information on how I loaded the data and added the "x" column.

Edit:

m <- read.csv(system.file("extdata", "movies.csv", package = "UpSetR"), 
                           header = T, sep = ";")
nrow(m)
[1] 3883
x<-c(rep("M", 3000), rep("C", 883))
m<-cbind(m, x)  
unique(m$x)
[1] M C

This is the structure of the data frame:

str(m)
'data.frame':   3883 obs. of  22 variables:
 $ Name       : Factor w/ 3883 levels "$1,000,000 Duck (1971)",..: 3577 1858 1483 3718 1175 1559 3010 3548 3363 1420 ...
 $ ReleaseDate: int  1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 ...
 $ Action     : int  0 0 0 0 0 1 0 0 1 1 ...
 $ Adventure  : int  0 1 0 0 0 0 0 1 0 1 ...
 $ Children   : int  1 1 0 0 0 0 0 1 0 0 ...
 $ Comedy     : int  1 0 1 1 1 0 1 0 0 0 ...
 $ Crime      : int  0 0 0 0 0 1 0 0 0 0 ...
 $ Documentary: int  0 0 0 0 0 0 0 0 0 0 ...
 $ Drama      : int  0 0 0 1 0 0 0 0 0 0 ...
 $ Fantasy    : int  0 1 0 0 0 0 0 0 0 0 ...
 $ Noir       : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Horror     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Musical    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Mystery    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Romance    : int  0 0 1 0 0 0 1 0 0 0 ...
 $ SciFi      : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Thriller   : int  0 0 0 0 0 1 0 0 0 1 ...
 $ War        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Western    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ AvgRating  : num  4.15 3.2 3.02 2.73 3.01 3.88 3.41 3.01 2.66 3.54 ...
 $ Watches    : int  2077 701 478 170 296 940 458 68 102 888 ...
 $ x          : Factor w/ 2 levels "M","C": 1 1 1 1 1 1 1 1 1 1 ...

Now I tried to implement the stacked bar plot as follow:

upset(m,
  queries = list(
    list(query = elements, 
         params = list("x", "M"), color = "#e69f00", active = T),
    list(query = elements, 
         params = list("x", "C"), color = "#cc79a7", active = T)))

The result looks like this:

enter image description here

As you can see the proportions are wrong as there should be in each bar only two colors (factor) either "M" or "C". This issue seems to be not a trivial one, as also pointed out here. Does anyone have an idea on how to implement this in UpsetR? Thanks a lot

Towrope answered 19/2, 2019 at 16:24 Comment(3)
@zx8754 thanks for your answer. I am not sure what is the function implemented in the upsetr. The original data are a data frame, and the figure is made implementing a matrix-like visualization. This is the repository: github.com/hms-dbmi/UpSetRTowrope
Can you edit your question to show where you made a reproducible example. I don't see one. The output of str(m) cannot be used to make an example.Semantic
I provided the link from where I dow loaded the data. I hope this helpsTowrope
K
9

Here is a way to create an upset plot with stacked barplot, but using my ComplexUpset rather than UpSetR:

stacked bars complex upset

library(ComplexUpset)
movies = as.data.frame(ggplot2movies::movies)
genres = colnames(movies)[18:24]

# for simplicity of examples, only use the complete data points
movies[movies$mpaa == '', 'mpaa'] = NA
movies = na.omit(movies)


upset(
    movies,
    genres,
    base_annotations=list(
        'Intersection size'=intersection_size(
            counts=FALSE,
            mapping=aes(fill=mpaa)
        )
    ),
    width_ratio=0.1
)

Please see more examples in the documentation. The Installation instructions are available on GitHub: krassowski/complex-upset (there is also a comparison to UpSetR and other packages).

Kilah answered 28/5, 2020 at 14:44 Comment(0)
F
3

I had a similar problem and found this workaround:

library("UpSetR")
m <- read.csv(system.file("extdata", "movies.csv", package = "UpSetR"), 
              header = T, sep = ";")
x<-c(rep("M", 2000), rep("Q", 1000), rep("C", 883))
m<-cbind(m, x)  

upset(m,
      queries = list(
        list(query = elements, 
             params = list("x", c("M","Q", "C")), color = "#e69f00", active = T),
        list(query = elements, 
             params = list("x", c("Q","C")), color = "#cc79a7", active = T),
        list(query = elements, 
             params = list("x", "C"), color = grey(0.7), active = T)))

The problem in the original example is that every query overlays over the total bar separately and starts at y=0. Thus, the remaining black part of the bar always has the exact same height as the purple part at the bottom. The workaround is to systematically add queries of combinations of the different values the variable can take:

  1. Start with a query and a respective color for the combination of all possible values (here c("M","Q","C") as the second parameter to params = list()).
  2. Successively leave out one of the possible values (e.g. c("Q","C") in the first step here). The value left out will be represented by the color of the query, the last one that still included it ("M" in this example).
  3. Continue adding queries until you have only one value left for the second parameter to params = list().

It should be possible do this programmatically for larger numbers of possible values and providing some color palette. But this remains a workaround and a native implementation of stacking the queries would be nice to have--so if you would like to see this functionality, you might consider bumping up the respective issue over at the Github repo.

Plot resulting from above example code

Frisbie answered 21/6, 2019 at 13:17 Comment(2)
Use the option query.legend to include a legend for the bar colors, e.g. query.legend = "bottom".Heptamerous
Nice addition, many thanks! In general, though, I'd recommend the current accepted answer by @Kilah as the cleanest solution: https://mcmap.net/q/1285712/-stacked-barplot-in-upsetr This already includes the color guide.Frisbie
E
1

Below the nice answer by @dlaehnemann but a little bit modified in order to create that list of list using a loop as well as linking wanted colors to it.

m <- read.csv(system.file("extdata", "movies.csv", package = "UpSetR"), header = T, sep = ";")
x<-c(rep("M", 2000), rep("Q", 1000), rep("C", 883))
m<-cbind(m, x)

i<-0
mylist<-list()
vectorUniqueValue <- unique(m$x)
colors = colorRampPalette(c("#332288",'#fdff00','#FF0000',"#CC6677","#88CCEE",'#36870c','#b786d2','#7c3c06',"#DDCC77",'#192194','#52cff4','#4f9c8b',"#4477AA",'#808080'))(length(vectorUniqueValue))
while ( length(vectorUniqueValue)>0 ){
  i<-i+1
  mylist[[i]]<-list(query = elements, params = list("x",as.character(vectorUniqueValue)), color = colors[i], active = T)
  vectorUniqueValue<-vectorUniqueValue[-1]
}
upset(m, queries = mylist)

Hope it helps a bit until maybe one day someone works on the issue on github !

Epiphytotic answered 17/7, 2019 at 10:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.