reordering rows in a dataframe according to the order of rows in another dataframe
Asked Answered
S

1

6

I am a new R user and new to StackOverflow. I will do my best to ask my question concisely and explicitly and my apologies if it is not communicated in the best way.

I am working with two dataframes. I want to reorder the rows of one dataframe so that it is identical to the order of the rows in the second dataframe so I can add data from one to the other with their formats being the same. The column I want to reorder the rows according to is a column with character string identifiers of different observation regions.

The first dataframe "dfverif" looks (in summary) like

Variable Value  
DAFQX   9   
DAFQX   9   
DAFQX   9   
DAFQX   9   
DAHEI   9   
DAHEI   9   
DAHEI   9   
DAHEI   9   
BAARG   9       
BAARG   9       
BAARG   9   
BAARG   9   
CBUCG   9   
CBUCG   9   
CBUCG   9   
CBUCG   9   
DALZZ   9   
DALZZ   9   
DALZZ   9   
DALZZ   9   

The second dataframe "dfmax" looks like

variable value
DALZZ   2.14
DALZZ   2.02
DALZZ   2.04
CBUCG   1.83
CBUCG   2.09
CBUCG   1.96
CBUCG   1.98
DAHEI   2.25
DAHEI   2.05
DAHEI   2.08
DAFQX   2.12
DAFQX   2.12
DAFQX   2.04
BAARG   2.12
BAARG   2.56
BAARG   2.56

I want to reorder the rows of the second dataframe in terms of the order of the rows of the character vector in the first dataframe. But, there are many duplicate strings because this is time-series data so I can't use match, and I can't delete the duplicates because they hold necessary data. Also, the second dataframe is much smaller than the first (it is maximums of the time-series data rather than raw observations). I know that limits cbind and rbind but that rbind.fill and cbindX can be used if needed, although I'm not sure they are here. In actuality these dataframes have more columns but I only included 2 here for conciseness.

Based on the question here Order data frame rows according to a target vector that specifies the desired order

I tried doing that code

target <- dfverif
idx <- sapply(target,function(x){
which(dfmax$variable==x)
})
idx <- unlist(idx) ##I added this because the code gave me errors because idx is classified as a list so R couldn't do the dfmax[idx,] component
dfmax <- dfmax[idx,]
rownames(dfmist) <- NULL

But now when I do head(dfmax) I get

[1] V1 V2
<0 rows> (or 0-length row.names)

Which I can't make sense of, and when I do str(dfmax) I get the same ordering of character variables that it had before, nothing has changed. Am I barking up the wrong tree? Is there another way to approach this that I am not aware of? Or am I trying to execute this function improperly?

Thank you for your time and help.

Saar answered 8/12, 2014 at 16:46 Comment(0)
P
15

I'm not willing to accept that match cannot be used. It does return a possibly non-unique result, but you didn't say anything about needing a secondary sort and if you did it could easily be added as a second argument to order. I tested this on various reduced subsets of the second dataframe including one that only had single instances of each of the variable instances.

The difference in lengths should not be an issue. Here I demonstrate with first the ordering of d2 ('dfmax', shorter) by d1 ('dfverif', longer) and then an ordering of d1 by d2:

d2[ order(match(d2$variable, d1$Variable)), ]
   variable value
11    DAFQX  2.12
12    DAFQX  2.12
13    DAFQX  2.04
8     DAHEI  2.25
9     DAHEI  2.05
10    DAHEI  2.08
14    BAARG  2.12
15    BAARG  2.56
16    BAARG  2.56
4     CBUCG  1.83
5     CBUCG  2.09
6     CBUCG  1.96
7     CBUCG  1.98
1     DALZZ  2.14
2     DALZZ  2.02
3     DALZZ  2.04
d1[ order(match(d1$Variable, d2$variable)), ]

   Variable Value
17    DALZZ     9
18    DALZZ     9
19    DALZZ     9
20    DALZZ     9
13    CBUCG     9
14    CBUCG     9
15    CBUCG     9
16    CBUCG     9
5     DAHEI     9
6     DAHEI     9
7     DAHEI     9
8     DAHEI     9
1     DAFQX     9
2     DAFQX     9
3     DAFQX     9
4     DAFQX     9
9     BAARG     9
10    BAARG     9
11    BAARG     9
12    BAARG     9
Phobia answered 8/12, 2014 at 17:8 Comment(4)
I now see that, but the use of the order(match( , )) should still succeed, shouldn't it?Phobia
My answer was in fact producing a wrong result so I deleted it. Yours is the way to go :)Subcritical
Worked fine on the test cases I made; You should post as an edit in your question the output of dput(head(dfmax)) and dput(head(dverif)).Phobia
I had to save the dataframes and then read them in with headers and give them column names and then it worked. Thanks so much! :)Saar

© 2022 - 2024 — McMap. All rights reserved.