Order categorical data in a stacked bar plot with ggplot2
Asked Answered
B

3

13

I have a matrix with the following entries:

MilDis <- data.frame(
  hhDomMil = c(
    "HED", "ETB", "HED", "ETB", "PER", "BUM", "EXP", "TRA", "TRA", "PMA", "MAT",
    "MAT", "KON", "ETB", "PMA", "PMA", "HED", "BUM", "BUM", "HED", "PMA", "PMA",
    "HED", "TRA", "BUM", "EXP", "BUM", "PMA", "ETB", "MAT", "ETB", "ETB", "KON",
    "MAT", "TRA", "BUM", "BUM", "TRA", "TRA", "PMA", "PMA", "PMA", "MAT", "ETB",
    "TRA", "BUM", "TRA", "MAT", "BUM", "ETB", "TRA", "TRA", "BUM", "KON", "ETB",
    "ETB", "ETB", "BUM", "KON", "ETB", "ETB", "PMA", "TRA", "PER", "PER", "MAT",
    "HED", "KON", "TRA", "TRA", "TRA", "EXP", "TRA", "BUM", "MAT", "MAT", "TRA",
    "PMA", "HED", "PER", "TRA", "PER", "EXP", "PER", "BUM", "KON", "BUM", "ETB",
    "ETB", "TRA", "PER", "ETB", "KON", "KON", "BUM", "ETB", "BUM", "MAT", "BUM",
    "KON", "KON", "ETB", "MAT", "KON", "PER", "ETB", "ETB", "KON", "PMA", "PER",
    "HED", "HED", "PMA", "MAT", "PMA", "PER", "PMA", "TRA", "TRA", "MAT", "BUM",
    "BUM", "KON", "ETB", "ETB", "ETB", "PMA", "TRA", "TRA", "PMA", "PER", "KON",
    "PER", "BUM", "KON", "ETB", "ETB", "BUM", "TRA", "ETB", "PMA", "HED", "MAT",
    "TRA", "BUM", "PMA", "BUM", "ETB", "TRA", "TRA", "TRA", "PER", "EXP", "HED",
    "BUM", "EXP", "HED", "BUM", "MAT", "DDR", "BUM", "MAT", "KON", "HED", "HED",
    "TRA", "BUM", "PMA", "PMA", "PMA", "KON", "KON", "MAT", "ETB", "MAT", "TRA",
    "MAT", "ETB", "ETB", "TRA", "MAT", "ETB", "TRA", "HED", "BUM", "MAT", "TRA",
    "PMA", "BUM", "BUM", "EXP", "ETB", "EXP", "EXP", "MAT", "TRA", "KON", "BUM",
    "BUM", "HED"
  ),
  kclust = c(
    1L, 2L, 15L, 4L, 5L, 6L, 5L, 7L, 8L, 5L, 6L, 5L, 11L, 6L, 5L,
    1L, 9L, 10L, 2L, 1L, 9L, 8L, 4L, 11L, 14L, 5L, 8L, 11L, 12L,
    5L, 5L, 14L, 15L, 2L, 10L, 6L, 8L, 4L, 6L, 8L, 14L, 14L, 16L,
    10L, 5L, 1L, 12L, 17L, 12L, 16L, 16L, 5L, 10L, 14L, 8L, 19L,
    5L, 4L, 4L, 14L, 2L, 14L, 9L, 7L, 1L, 14L, 4L, 15L, 18L, 16L,
    9L, 14L, 6L, 14L, 12L, 11L, 4L, 7L, 8L, 12L, 9L, 16L, 2L, 6L,
    15L, 1L, 1L, 3L, 14L, 5L, 5L, 9L, 14L, 6L, 5L, 14L, 15L, 2L,
    14L, 2L, 1L, 8L, 5L, 10L, 1L, 1L, 16L, 5L, 2L, 9L, 9L, 1L, 12L,
    10L, 1L, 4L, 1L, 9L, 8L, 8L, 5L, 10L, 1L, 10L, 2L, 6L, 15L, 2L,
    2L, 10L, 5L, 6L, 10L, 19L, 19L, 6L, 5L, 6L, 7L, 7L, 8L, 5L, 16L,
    5L, 6L, 6L, 1L, 10L, 12L, 4L, 7L, 19L, 7L, 8L, 16L, 10L, 5L,
    16L, 12L, 7L, 7L, 19L, 4L, 6L, 1L, 15L, 7L, 8L, 16L, 4L, 10L,
    15L, 11L, 10L, 1L, 10L, 17L, 1L, 2L, 1L, 14L, 8L, 8L, 14L, 10L,
    8L, 6L, 6L, 8L, 5L, 7L, 5L, 1L, 5L, 7L, 9L, 2L, 1L, 9L, 14L
  ),
  order = c(
    9, 1, 9, 1, 3, 7, 10, 5, 5, 2, 8, 8, 4, 1, 2, 2, 9, 7, 7, 9, 2, 2, 9, 5, 7,
    10, 7, 2, 1, 8, 1, 1, 4, 8, 5, 7, 7, 5, 5, 2, 2, 2, 8, 1, 5, 7, 5, 8, 7, 1, 5,
    5, 7, 4, 1, 1, 1, 7, 4, 1, 1, 2, 5, 3, 3, 8, 9, 4, 5, 5, 5, 10, 5, 7, 8, 8, 5,
    2, 9, 3, 5, 3, 10, 3, 7, 4, 7, 1, 1, 5, 3, 1, 4, 4, 7, 1, 7, 8, 7, 4, 4, 1, 8,
    4, 3, 1, 1, 4, 2, 3, 9, 9, 2, 8, 2, 3, 2, 5, 5, 8, 7, 7, 4, 1, 1, 1, 2, 5, 5,
    2, 3, 4, 3, 7, 4, 1, 1, 7, 5, 1, 2, 9, 8, 5, 7, 2, 7, 1, 5, 5, 5, 3, 10, 9, 7,
    10, 9, 7, 8, 6, 7, 8, 4, 9, 9, 5, 7, 2, 2, 2, 4, 4, 8, 1, 8, 5, 8, 1, 1, 5, 8,
    1, 5, 9, 7, 8, 5, 2, 7, 7, 10, 1, 10, 10, 8, 5, 4, 7, 7, 9
  )
)

I want to create a stacked bar plot like this one Barplot.

The only problem is, that I would like to have the order of the stacks to fit this (ETB,PMA,PER,KON,TRA,DDR,BUM,MAT,HED,EXP) - the order numbers in the matrix and I have also some aesthetic problems. I searched for a solution here but none of the ordering suggestions worked for me... :-\

  1. How do I plot such a ordered plot?
  2. How do I set up x so that each bar is "on" one number?
  3. How do I seperate the bars - here I tried that with a white border...?
  4. How do I print all kclust numbers in x?

Thanks a lot for your help! Dominik


UPDATE

Here is the code I used to draw my plot:

mycols <- c('#FFFD00', '#97CB00', '#3168FF', '#FF0200', '#FB02FE', \
'#CCFCCC', '#FE9900', '#98CBF8', '#00CCFF', '#00FD03') # Set milieu colors


ggplot(MilDis) +
 geom_bar(aes(kclust, fill=factor(hhDomMil), \
 colour=mycols), position='fill', binwidth=1, colour='white') +
 scale_fill_manual(values = mycols)

UPDATE 2:

That's how I did it now:

    mycols <- c('#3168FF', '#00CCFF', '#98CBF8', '#CCFCCC', '#00FD03',\
   '#97CB00', '#FFFD00', '#FE9900', '#FB02FE', '#FF0200') # Set milieu colors
   
    ggplot(MilDis) +
      geom_bar(aes(factor(kclust), fill=reorder(hhDomMil,order)),\
      position='fill') +
      scale_fill_manual(values = mycols)

With this result:

Image

Thank you all for your help!

Bahuvrihi answered 22/8, 2011 at 16:21 Comment(8)
Can you post the ggplot code you used to get the plot shown here? It would save a little bit of time in getting up to speed to make the modifications (other than ordering, which @Gavin Simpson has dealt with below) that you are requesting ...Schargel
You should ask 1 question per Question - it makes it easier to search and find Answers.Periodical
@Ben: I just updated my post.Bahuvrihi
@Gavin You're right, but splitting it up would made it also more complicated...Bahuvrihi
@Bahuvrihi ??? Why? I've Answered 1 and didn't even need the plotting code. 2,3, & 4 Just need kclust coercing to a factor - at the moment you are using a continuous variable and hence continuous scale for the x-axis.Periodical
@Gavin: Right, factor() is the solution, but that I did't know at that time and splitting up the question, I thought might be to confusing. But maybe I'm wrong.Bahuvrihi
You can link between Questions, to show their relationship. There was nothing wrong in general with what you wrote I just wanted to point out for future reference that it is preferred to post only a single question per post. As I said, this helps SO be more than just a helpful Q&A for the person asking the Question. If Questions are focused and specific it helps users searching SO drill down to the Q&As that get to the root of their problem.Periodical
Ah I see. I'll consider that next timeBahuvrihi
E
12

I see that you have an order column in your data frame which I gather is your order. Hence you can simply do.

p0 = qplot(factor(kclust), fill = reorder(hhDomMil, order), position = 'fill', 
       data = df1)

Here are the elements of this code that take care of your questions

  1. How do I plot such a ordered plot? reorder
  2. How do I set up x so that each bar is "on" one number? factor(kclust)
  3. How do I seperate the bars?
  4. How do I print all kclust numbers in x? factor(kclust)

I remember from a previous question of yours that the hhDomMil corresponded to different groups, and I suspect your ordering follows the grouping. In that case, you might want to use that information to choose a color palette that makes it simpler to follow the graph. Here is one way to do it.

mycols = c(brewer.pal(3, 'Oranges'), brewer.pal(3, 'Greens'), 
           brewer.pal(2, 'Blues'), brewer.pal(2, 'PuRd'))

p0 + scale_fill_manual(values = mycols)

enter image description here

Evacuee answered 22/8, 2011 at 17:41 Comment(3)
Thanks a lot for your solution! This is exactly what I was looking for. And your assumption was absolute right, they correspond in the described way.Bahuvrihi
if the cluster numbers did not mean anything, then i would reorder this plot so that clusters are arranged according to the number of elements they contain. or you could also arrange the clusters based on number of groups they contain.Evacuee
The cluster correspond to a neighbourhood setting. But all milieus are in all clusters. In the shown sample are not all datapoints, bcause it would be to large to post them here... But your idea is good, do you have an idea for the plot in update2?Bahuvrihi
P
12

The answer to this is easily solved by getting your data formatted correctly before passing it to ggplot(). The key is to explicitly set the levels of the hhDomMil factor. Assuming your data are in dat:

dat <- transform(dat, hhDomMil = factor(hhDomMil,
                                        levels = c("ETB", "PMA", "PER", "KON",
                                                   "TRA", "DDR", "BUM", "MAT",
                                                   "HED", "EXP")))

That fixes hhDomMil as a factor in place inside dat, and sets the levels to be in the order you wanted:

> head(dat$hhDomMil)
[1] HED ETB HED ETB PER BUM
Levels: ETB PMA PER KON TRA DDR BUM MAT HED EXP

Notice what is happing when R coerces hhDomMil to a factor:

> head(factor(as.character(dat$hhDomMil)))
[1] HED ETB HED ETB PER BUM
Levels: BUM DDR ETB EXP HED KON MAT PER PMA TRA

The default is to sort the levels alphabetically, which is why the plot is coming out as you show.

The best advice I can give, is to get your data correctly formatted first and only then try to plot it - don't rely on automatic or on-the-fly conversion to get this right for you; inevitably it won't be what you want.

Periodical answered 22/8, 2011 at 16:44 Comment(0)
E
12

I see that you have an order column in your data frame which I gather is your order. Hence you can simply do.

p0 = qplot(factor(kclust), fill = reorder(hhDomMil, order), position = 'fill', 
       data = df1)

Here are the elements of this code that take care of your questions

  1. How do I plot such a ordered plot? reorder
  2. How do I set up x so that each bar is "on" one number? factor(kclust)
  3. How do I seperate the bars?
  4. How do I print all kclust numbers in x? factor(kclust)

I remember from a previous question of yours that the hhDomMil corresponded to different groups, and I suspect your ordering follows the grouping. In that case, you might want to use that information to choose a color palette that makes it simpler to follow the graph. Here is one way to do it.

mycols = c(brewer.pal(3, 'Oranges'), brewer.pal(3, 'Greens'), 
           brewer.pal(2, 'Blues'), brewer.pal(2, 'PuRd'))

p0 + scale_fill_manual(values = mycols)

enter image description here

Evacuee answered 22/8, 2011 at 17:41 Comment(3)
Thanks a lot for your solution! This is exactly what I was looking for. And your assumption was absolute right, they correspond in the described way.Bahuvrihi
if the cluster numbers did not mean anything, then i would reorder this plot so that clusters are arranged according to the number of elements they contain. or you could also arrange the clusters based on number of groups they contain.Evacuee
The cluster correspond to a neighbourhood setting. But all milieus are in all clusters. In the shown sample are not all datapoints, bcause it would be to large to post them here... But your idea is good, do you have an idea for the plot in update2?Bahuvrihi
N
7

If you relevel your hhDomMil as a factor like this:

o<-c("ETB" "PMA" "PER" "KON" "TRA" "DDR" "BUM" "MAT" "HED" "EXP")
d$hh<-factor(d$hhDomMil,levels=o)

then your plot will be in the order you like:

ggplot(d,(aes(x=kclust, fill=hh))) +geom_bar(position="fill")
Neogothic answered 22/8, 2011 at 17:5 Comment(2)
I like this solution because a) it is terse, and b) it generalizes to non-ggplot questions as well.Protractor
Like Gavins, solution, this is a nice generalized solutionBahuvrihi

© 2022 - 2024 — McMap. All rights reserved.