ggplot2 geom_bar - how to keep order of data.frame [duplicate]
Asked Answered
D

1

39

I have a question concerning the order of data in my geom_bar.

This is my dataset:

  SM_P,Spotted melanosis on palm,16.2
  DM_P,Diffuse melanosis on palm,78.6
  SM_T,Spotted melanosis on trunk,57.3
  DM_T,Diffuse melanosis on trunk,20.6
  LEU_M,Leuco melanosis,17
  WB_M,Whole body melanosis,8.4
  SK_P,Spotted keratosis on palm,35.4
  DK_P,Diffuse keratosis on palm,23.5
  SK_S,Spotted keratosis on sole,66
  DK_S,Diffuse keratosis on sole,52.8
  CH_BRON,Dorsal keratosis,39
  LIV_EN,Chronic bronchities,6
  DOR,Liver enlargement,2.4
  CARCI,Carcinoma,1

I assign the following colnames:

  colnames(df) <- c("abbr", "derma", "prevalence") # Assign row and column names

Then I plot:

  ggplot(data=df, aes(x=derma, y=prevalence)) + geom_bar(stat="identity") + coord_flip()

Plot

Why does ggplot2 randomly change the order of my data. I would like to have the order of my data in align with my data.frame.

Any help is much appreciated!

Deem answered 30/6, 2016 at 19:20 Comment(7)
It's not random, it's alphabetical. See here for solution #3254141Kingsize
First of all thanks for your response. If I apply derma_table <- table(df$derma) derma_levels <- names(derma_table)[order(df$prevalence)] df$derma2 <- factor(df$derma, levels =derma_levels) and then plot ggplot(data=df, aes(x=derma, y=prevalence)) + geom_bar(stat="identity") + coord_flip() plots exactly the same as in my question. In fact the commands only change the data.frame into alphabetical order which is exactly what I would like to avoid`Burgage
you are re-leveling the derma2 factor, but then using x=dermaKingsize
Hey arvi, first of all thanks for your patience. I don't fully understand because if I open my df df$derma and df$derma2 have exactly the same order anyway. So if I change what df$ I plot it doesn't make a difference.Burgage
See below for plot ordering either by 'native order' or sorted by prevalenceKingsize
The order of rows in your data frame doesn't matter at all. The order that matters is the order of the levels of the factor: levels(df$derma). You put those in whatever order you want to plot.Psaltery
@jaap I think this question (i.e. how to stop reordering of cols) is slightly (but importantly) different to the question how to reorder cols. I think it would be very useful to reopen on that basis. The reason being, if an order has been determined earlier in a workflow, the current best answers make this happen: i) order determined, ii) ggplot reorders, iii) (best answers) reorder again. Which doesn't make much sense if the data were originally in the correct order and some (any) way exists to stop geom_bar from reorderingStoic
K
65

Posting as answer because comment thread getting long. You have to specify the order by using the factor levels of the variable you map with aes(x=...)

# lock in factor level order
df$derma <- factor(df$derma, levels = df$derma)

# plot
ggplot(data=df, aes(x=derma, y=prevalence)) + 
    geom_bar(stat="identity") + coord_flip()

Result, same order as in df: enter image description here

# or, order by prevalence:
df$derma <- factor(df$derma, levels = df$derma[order(df$prevalence)])

Same plot command gives:

enter image description here


I read in the data like this:

read.table(text=
"SM_P,Spotted melanosis on palm,16.2
DM_P,Diffuse melanosis on palm,78.6
SM_T,Spotted melanosis on trunk,57.3
DM_T,Diffuse melanosis on trunk,20.6
LEU_M,Leuco melanosis,17
WB_M,Whole body melanosis,8.4
SK_P,Spotted keratosis on palm,35.4
DK_P,Diffuse keratosis on palm,23.5
SK_S,Spotted keratosis on sole,66
DK_S,Diffuse keratosis on sole,52.8
CH_BRON,Dorsal keratosis,39
LIV_EN,Chronic bronchities,6
DOR,Liver enlargement,2.4
CARCI,Carcinoma,1", header=F, sep=',')
colnames(df) <- c("abbr", "derma", "prevalence") # Assign row and column names
Kingsize answered 30/6, 2016 at 20:30 Comment(5)
Thanks for your effort! I really appreciate your help! Did you remove some lines from the code which you posted? I don't get the same tick marks when I try the code.Burgage
the only thing I didn't post was the code I used to read in your data. now added.Kingsize
Strange ... I get a different axis. Anyway, thank you very much for your effort! Much appreciated! :)Burgage
If anyone is wondering how to do this with data where levels are present more than once in the variable (i.e., when not using stat = "identity" but rather the default count stat), you can add the unique() function in the first step. For example: df$var <- factor(df$var, levels = unique(df$var))Corpuz
@Corpuz if you are not using unique() in the newer versions of R, you may encounter problems. Thanks for that tipEldrida

© 2022 - 2024 — McMap. All rights reserved.