Reordering factor gives different results, depending on which packages are loaded
Asked Answered
C

1

18

I wanted to create a barplot in which the bars were ordered by height rather than alphabetically by category. This worked fine when the only package I loaded was ggplot2. However, when I loaded a few more packages and ran the same code that created, sorted, and plotted my data frame, the bars had reverted to being sorted alphabetically again.

I checked the data frame each time using str() and it turned out that the attributes of the data frame were now different, even though I'd run the same code each time.

My code and output are listed below. Can anyone explain the differing behavior? Why does loading a few apparently unrelated packages (unrelated in the sense that none of the functions I'm using seem to be masked by the newly loaded packages) change the result of running the transform() function?

Case 1: Just ggplot2 loaded

library(ggplot2)

group = c("C","F","D","B","A","E")
num = c(12,11,7,7,2,1)
data = data.frame(group,num)
data1 = transform(data, group=reorder(group,-num))

> str(data1)
'data.frame':   6 obs. of  2 variables:
 $ group: Factor w/ 6 levels "C","F","B","D",..: 1 2 4 3 5 6
  ..- attr(*, "scores")= num [1:6(1d)] -2 -7 -12 -7 -1 -11
  .. ..- attr(*, "dimnames")=List of 1
  .. .. ..$ : chr  "A" "B" "C" "D" ...
 $ num  : num  12 11 7 7 2 1

Case 2: Load several more packages, then run the same code again

library(plyr)
library(xtable)
library(Hmisc)
library(gmodels)
library(reshape2)
library(vcd)
library(lattice)

group = c("C","F","D","B","A","E")
num = c(12,11,7,7,2,1)
data = data.frame(group,num)
data1 = transform(data, group=reorder(group,-num))

> str(data1)
'data.frame':   6 obs. of  2 variables:
 $ group: Factor w/ 6 levels "A","B","C","D",..: 3 6 4 2 1 5
 $ num  : num  12 11 7 7 2 1

UPDATE: SessionInfo()

Case 1: Ran sessionInfo() after loading ggplot2

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
  [1] C/en_US.UTF-8/C/C/C/C

attached base packages:
  [1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
  [1] ggplot2_0.9.1

loaded via a namespace (and not attached):
  [1] MASS_7.3-18        RColorBrewer_1.0-5 colorspace_1.1-1   dichromat_1.2-4    digest_0.5.2       grid_2.15.0       
[7] labeling_0.1       memoise_0.1        munsell_0.3        plyr_1.7.1         proto_0.3-9.2      reshape2_1.2.1    
[13] scales_0.2.1       stringr_0.6        tools_2.15.0

Case 2: Ran sessionInfo() after loading the additional packages

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
  [1] C/en_US.UTF-8/C/C/C/C

attached base packages:
  [1] grid      splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
  [1] lattice_0.20-6   vcd_1.2-13       colorspace_1.1-1 MASS_7.3-18      reshape2_1.2.1   gmodels_2.15.2  
[7] Hmisc_3.9-3      survival_2.36-14 xtable_1.7-0     plyr_1.7.1       ggplot2_0.9.1   

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.0-5 cluster_1.14.2     dichromat_1.2-4    digest_0.5.2       gdata_2.8.2        gtools_2.6.2      
[7] labeling_0.1       memoise_0.1        munsell_0.3        proto_0.3-9.2      scales_0.2.1       stringr_0.6       
[13] tools_2.15.0
Clementinaclementine answered 7/6, 2012 at 20:39 Comment(6)
Could you provide the output of sessionInfo()? If anyone can help, they may have to match your R and package versions to replicate this.Jandel
I can replicate this on R 2.15.0 with the up to date CRAN packages (on Ubuntu)Mcgannon
Very interesting. Looks like the change in the results of transform() only appears after loading gmodels (and it's not fixed by subsequently detaching gmodels). I'm intrigued... (FWIW, I'm on Windows XP, running R-devel, so it looks like this is not an OS or version specific problem.)Urmia
@Jandel I've added output of sessionInfo() as an edit to my question.Clementinaclementine
You can get similar behavior with most ggplot2 objects by running str with and without loading library(proto). Using proto greatly expands the display of proto objects.Habited
I believe this is also marginally relevant to the problem: https://mcmap.net/q/375546/-function-reorder-in-r-and-ordering-values-duplicate . In short, reorder needs the second parameter to be as.factor(.) to order properly.Officious
I
13

This happens because:

  1. gmodels imports gdata
  2. gdata creates a new method for reorder.factor

Start a clean session. Then:

methods("reorder")
[1] reorder.default*    reorder.dendrogram*

Now load gdata (or load gmodels, which has the same effect):

library(gdata)
methods("reorder")
[1] reorder.default*    reorder.dendrogram* reorder.factor 

Notice there is no masking, since reorder.factor doesn't exist in base

Recreate the problem, but this time explicitly call the different packages:

group = c("C","F","D","B","A","E")
num = c(12,11,7,7,2,1)
data = data.frame(group,num)

The base R version (using reorder.default):

str(transform(data, group=stats:::reorder.default(group,-num)))
'data.frame':   6 obs. of  2 variables:
 $ group: Factor w/ 6 levels "C","F","B","D",..: 1 2 4 3 5 6
  ..- attr(*, "scores")= num [1:6(1d)] -2 -7 -12 -7 -1 -11
  .. ..- attr(*, "dimnames")=List of 1
  .. .. ..$ : chr  "A" "B" "C" "D" ...
 $ num  : num  12 11 7 7 2 1

The gdata version (using reorder.factor):

str(transform(data, group=gdata:::reorder.factor(group,-num)))
'data.frame':   6 obs. of  2 variables:
 $ group: Factor w/ 6 levels "A","B","C","D",..: 3 6 4 2 1 5
 $ num  : num  12 11 7 7 2 1
Indecency answered 7/6, 2012 at 21:36 Comment(5)
You can get the "expected" order using the gdata::reorder.factor version by adding a FUN=identity argument: data1 = transform(data, group=reorder(group,-num,FUN=identity)).Simulated
Just to make sure I understand the lesson here: When you load a package, you can get different behavior with the exact same code, even in the absence of masking, if the new package has a method specific to your object (in this case reorder.factor), that "overrides" the behavior of the "top"-level method (in this case, generic reorder) that would otherwise apply to your object. Is that correct?Clementinaclementine
@Clementinaclementine Yes, your example clearly illustrates that. Pedantry about terminology: reorder.factor gets dispatched rather than reorder.default (thus in a sense overriding the previous behaviour). This is a very interesting problem. thank you.Indecency
Andrie, thanks for the clear and detailed answer. @BrianDiggs Thanks for showing how to recover the desired behavior.Clementinaclementine
Many thanks everyone. This problem was driving me berserk. Is there any way to add some more metadata to this question to make it more readily discoverable? There's at least one other poor soul out there in a state of confusion:Lever

© 2022 - 2024 — McMap. All rights reserved.