Consider I have a data frame like this,
set.seed(1)
q<-100
df <- data.frame(Var1 = round(runif(q,1,50)),
Var2 = round(runif(q,1,50)),
Var3 = round(runif(q,1,50)),
Var4 = round(runif(q,1,50)))
attach(df)
As you realized, q
is standing for setting the length of the each columns in the dataframe.
I want to make a filtering of all possible combinations of the columns. It can be anything. Let's say I am seeking for if the devision of the sums of the first two columns and the sums of the last two columns greater than 1 or not.
One thing to achieve that, using expand.grid()
function.
a <- Sys.time()
expanded <- expand.grid(Var1, Var2, Var3, Var4)
Sys.time() - a
Time difference of 8.31997 secs
expanded <- expanded[rowSums(expanded[,1:2])/ rowSums(expanded[,3:4])>1,]
However it takes a lot time! To make it faster, I tried to follow the answer with rep.int()
function in this question and designed my own function.
myexpand <- function(...) {
sapply(list(...),function(y) rep.int(y, prod(lengths(list(...)))/length(y)))
}
But it is not so promising again. It takes more time comparing to my expectation and the expand.grid
also.And, If I set a greater q
, it becomes a nigthmare!
Is there a proper way to achieve this a lot faster (1-2 seconds) with maybe matrix operations before applying expand.grid
or myexpand
. And, I wonder if it is a weakness of using an interpreted language like R... Software suggestions are also acceptable.
Rcpp
implementation? – Principalities