Alternative to expand.grid for data.frames
Asked Answered
A

6

43

I have a data.frame df and I want that every row in this df is duplicated lengthTime times and that a new column is added that counts from 1 to lengthTime for each row in df.

I know, it sounds pretty complicated, but what I basically want is to apply expand.grid to df. Here is an ugly workaround and I have the feeling that there most be an easier solution (maybe even a base-R function?):

df <- data.frame(ID   = rep(letters[1:3], each=3),
                 CatA = rep(1:3, times = 3),
                 CatB = letters[1:9])
lengthTime <- 3
nrRow <- nrow(df)
intDF <- df
for (i in 1:(lengthTime - 1)) {
  df <- rbind(df, intDF)
}
df$Time <- rep(1:lengthTime, each=nrRow)

I thought that I could just use expand.grid(df, 1:lengthTime), but that does not work. outer did not bring any luck either. So does anyone know a good solution?

Aiaia answered 27/7, 2012 at 18:27 Comment(0)
B
20

Why not just something like df[rep(1:nrow(df),times = 3),] to extend the data frame, and then add the extra column just as you have above, with df$Time <- rep(1:lengthTime, each=nrRow)?

Bombastic answered 27/7, 2012 at 18:39 Comment(0)
J
66

It's been a while since this question was posted, but I recently came across it looking for just the thing in the title, namely, an expand.grid that works for data frames. The posted answers address the OP's more specific question, so in case anyone is looking for a more general solution for data frames, here's a slightly more general approach:

expand.grid.df <- function(...) Reduce(function(...) merge(..., by=NULL), list(...))

# For the example in the OP
expand.grid.df(df, data.frame(1:lengthTime))

# More generally
df1 <- data.frame(A=1:3, B=11:13)
df2 <- data.frame(C=51:52, D=c("Y", "N"))
df3 <- data.frame(E=c("+", "-"))
expand.grid.df(df1, df2, df3)
Jacqui answered 20/2, 2014 at 14:48 Comment(1)
Worked really nice ... I used this method for this SO post, in my solution here: SO linkGeoponic
D
27

You can also just do a simple merge by NULL (which will cause merge to do simple combinatorial data replication):

merge(data.frame(time=1:lengthTime), iris, by=NULL)
Dilworth answered 3/11, 2014 at 14:27 Comment(1)
I like this solution the most but for the sake of simplicity (avoiding dependencies) you could have just wrote merge(data.frame(time=1:lengthTime), iris, by=NULL)Sousaphone
B
20

Why not just something like df[rep(1:nrow(df),times = 3),] to extend the data frame, and then add the extra column just as you have above, with df$Time <- rep(1:lengthTime, each=nrRow)?

Bombastic answered 27/7, 2012 at 18:39 Comment(0)
E
13

Quick update

There is now also the crossing() function in package tidyr which can be used instead of merge, is somewhat faster, and returns a tbl_df / tibble.

data.frame(time=1:10) %>% merge(iris, by=NULL) 

data.frame(time=1:10) %>% tidyr::crossing(iris) 
Editorial answered 16/8, 2016 at 12:13 Comment(0)
M
2

This works:

REP <- rep(1:nrow(df), 3)
df2 <- data.frame(df[REP, ], Time = rep(1:3, each = 9))
rownames(df2) <- NULL
df2
Martinemartineau answered 27/7, 2012 at 18:40 Comment(2)
Mine really isn't any different than joran's who beat me by 40 some seconds but I'll leave it as it's slightly more explicit.Martinemartineau
First come, first serve, so I accepted his answer ;-) But +1 for both of you. That's a very neat solution!Aiaia
C
2

A data.table solution:

> library(data.table)
>  ( df <- data.frame(ID   = rep(letters[1:3], each=3),
+                  CatA = rep(1:3, times = 3),
+                  CatB = letters[1:9]) )
  ID CatA CatB
1  a    1    a
2  a    2    b
3  a    3    c
4  b    1    d
5  b    2    e
6  b    3    f
7  c    1    g
8  c    2    h
9  c    3    i
> ( DT <- data.table(df)[, lapply(.SD, function(x) rep(x,3))][, Time:=rep(1:3, each=nrow(df0))] )
    ID CatA CatB Time
 1:  a    1    a    1
 2:  a    2    b    1
 3:  a    3    c    1
 4:  b    1    d    1
 5:  b    2    e    1
 6:  b    3    f    1
 7:  c    1    g    1
 8:  c    2    h    1
 9:  c    3    i    1
10:  a    1    a    2
11:  a    2    b    2
12:  a    3    c    2
13:  b    1    d    2
14:  b    2    e    2
15:  b    3    f    2
16:  c    1    g    2
17:  c    2    h    2
18:  c    3    i    2
19:  a    1    a    3
20:  a    2    b    3
21:  a    3    c    3
22:  b    1    d    3
23:  b    2    e    3
24:  b    3    f    3
25:  c    1    g    3
26:  c    2    h    3
27:  c    3    i    3

Another one :

> library(data.table)
>  ( df <- data.frame(ID   = rep(letters[1:3], each=3),
+                  CatA = rep(1:3, times = 3),
+                  CatB = letters[1:9]) )
> DT <- data.table(df)
> rbindlist(lapply(1:3, function(i) cbind(DT, Time=i)))
    ID CatA CatB Time
 1:  a    1    a    1
 2:  a    2    b    1
 3:  a    3    c    1
 4:  b    1    d    1
 5:  b    2    e    1
 6:  b    3    f    1
 7:  c    1    g    1
 8:  c    2    h    1
 9:  c    3    i    1
10:  a    1    a    2
11:  a    2    b    2
12:  a    3    c    2
13:  b    1    d    2
14:  b    2    e    2
15:  b    3    f    2
16:  c    1    g    2
17:  c    2    h    2
18:  c    3    i    2
19:  a    1    a    3
20:  a    2    b    3
21:  a    3    c    3
22:  b    1    d    3
23:  b    2    e    3
24:  b    3    f    3
25:  c    1    g    3
26:  c    2    h    3
27:  c    3    i    3
Crawler answered 25/8, 2015 at 18:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.