How to obtain all combinations of the columns of a data frame taken by 2?
Asked Answered
D

2

9

Suppose I have this data frame:

matrix(c(2,4,3,1,5,7,1,2,3,5,8,2,4,5,1,1,3,6,1,3,4,5,6,1),nrow=6,ncol=4,byrow = TRUE)->X
as.data.frame(X)->X.df

  V1 V2 V3 V4
1  2  4  3  1
2  5  7  1  2
3  3  5  8  2
4  4  5  1  1
5  3  6  1  3
6  4  5  6  1

then I would like to obtain a list of a set of data frames containing all combinations of columns taken by 2, without repetition, and avoiding any column with itself. That means, a list of dataframes with the following headers:

V1,V2
V1,V3
V1,V4
V2,V3
V2,V4
V3,V4

Any idea of how to do this?

Deration answered 18/8, 2013 at 11:59 Comment(0)
S
18
combn(X.df, 2, simplify=FALSE)
[[1]]
  V1 V2
1  2  4
2  5  7
3  3  5
4  4  5
5  3  6
6  4  5

[[2]]
  V1 V3
1  2  3
2  5  1
3  3  8
4  4  1
5  3  1
6  4  6

[[3]]
  V1 V4
1  2  1
2  5  2
3  3  2
4  4  1
5  3  3
6  4  1

[[4]]
  V2 V3
1  4  3
2  7  1
3  5  8
4  5  1
5  6  1
6  5  6

[[5]]
  V2 V4
1  4  1
2  7  2
3  5  2
4  5  1
5  6  3
6  5  1

[[6]]
  V3 V4
1  3  1
2  1  2
3  8  2
4  1  1
5  1  3
6  6  1
Salted answered 18/8, 2013 at 12:4 Comment(4)
sorry, I meant all the combinations of columns, so I want a list of dataframes. I am editing now my question to make it more clear. Thank you anyway!Deration
oohh I see that the answer is: combn(X.df,2,simplify=FALSE). Thank you very much!Deration
@user18441, if this was helpful, do consider voting it up; if it solved your problem, do consider accepting it.Biddy
This does not work; it returns combinations of all the names of the variables of the dataset.Oak
D
1

Since Thomas's solution does not work (anymore), here is a Base R solution. It gives back a list of all combinations of two columns without repetition and without combinations of columns with themselves. Essentially, it is a lapply()-vectorized combn()-command over the number of all columns of the original data.frame and subsequent subsetting.

Data

> X.df
  V1 V2 V3 V4
1  2  4  3  1
2  5  7  1  2
3  3  5  8  2
4  4  5  1  1
5  3  6  1  3
6  4  5  6  1

Code

df_list <- lapply(1:(ncol(combn(1:ncol(X.df), m = 2))), 
               function(y) X.df[, combn(1:ncol(X.df), m = 2)[,y]]) 

Output

> df_list 
[[1]]
  V1 V2
1  2  4
2  5  7
3  3  5
4  4  5
5  3  6
6  4  5

[[2]]
  V1 V3
1  2  3
2  5  1
3  3  8
4  4  1
5  3  1
6  4  6

[[3]]
  V1 V4
1  2  1
2  5  2
3  3  2
4  4  1
5  3  3
6  4  1

[[4]]
  V2 V3
1  4  3
2  7  1
3  5  8
4  5  1
5  6  1
6  5  6

[[5]]
  V2 V4
1  4  1
2  7  2
3  5  2
4  5  1
5  6  3
6  5  1

[[6]]
  V3 V4
1  3  1
2  1  2
3  8  2
4  1  1
5  1  3
6  6  1
Duro answered 2/2, 2021 at 16:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.