Combine a list of data frames into one preserving row names

Asked 30/6, 2015 at 13:47 Answered 3/10, 2022 at 7:54

I do know about the basics of combining a list of data frames into one as has been answered before. However, I am interested in smart ways to maintain row names. Suppose I have a list of data frames that are fairly equal and I keep them in a named list.

library(plyr)
library(dplyr)
library(data.table)

a = data.frame(x=1:3, row.names = letters[1:3])
b = data.frame(x=4:6, row.names = letters[4:6])
c = data.frame(x=7:9, row.names = letters[7:9])

l = list(A=a, B=b, C=c)

When I use do.call, the list names are combined with the row names:

> rownames(do.call("rbind", l))
[1] "A.a" "A.b" "A.c" "B.d" "B.e" "B.f" "C.g" "C.h" "C.i"

When I use any of rbind.fill, bind_rows or rbindlist the row names are replaced by a numeric range:

> rownames(rbind.fill(l))
> rownames(bind_rows(l))
> rownames(rbindlist(l))
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9"

When I remove the names from the list, do.call produces the desired output:

> names(l) = NULL
> rownames(do.call("rbind", l))
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i"

So is there a function that I'm missing that provides some finer control over the row names? I do need the names for a different context so removing them is sub-optimal.

Abbe answered 30/6, 2015 at 13:47 Comment(3)

Hadley, and thus the hadleyverse, does not really approve of rownames, so it's unlikely that you'll get any of those packages to preserve rownames. – Raddatz 30/6, 2015 at 13:51

Using data.table you could maybe do rbindlist(lapply(l, setDT, keep.rownames = TRUE)) though not sure regarding efficiency. – Tightfisted 30/6, 2015 at 14:7

Use dplyr::add_rownames() – Warrant 30/6, 2015 at 14:8

To preserve rownames, you can simply do:

do.call(rbind, unname(l))

#  x
#a 1
#b 2
#c 3
#d 4
#e 5
#f 6
#g 7
#h 8
#i 9

Or as you underlined by setting the rownames of l to NULL , this can be also done by:

do.call(rbind, setNames(l, NULL))

Towle answered 30/6, 2015 at 13:53 Comment(3)

Sometimes life is so simple. Thank you. – Abbe 30/6, 2015 at 13:54

You were almost there! At least you know setNames and unname ! – Towle 30/6, 2015 at 13:57

genius! great solution – Defilade 23/1, 2020 at 10:41

We can use add_rownames from dplyr package before binding:

rbind_all(lapply(l, add_rownames))

# Source: local data frame [9 x 2]
#
#   rowname x
# 1       a 1
# 2       b 2
# 3       c 3
# 4       d 4
# 5       e 5
# 6       f 6
# 7       g 7
# 8       h 8
# 9       i 9

Uhland answered 30/6, 2015 at 14:1 Comment(3)

It's good to know about add_rownames but doesn't serve me as well in my case. – Abbe 30/6, 2015 at 14:6

@Abbe what if rownames overlap in a, b, and c ? – Uhland 30/6, 2015 at 14:25

I know for certain that they won't in my scenario but it's a valid point, of course. – Abbe 30/6, 2015 at 14:28

Why not only using rbind:

 rbind(l$A, l$B, l$C)

Necroscopy answered 30/6, 2015 at 13:54 Comment(2)

I can't use that approach in my real example because I am sub-setting the full list by a vector, something like l[c("A", "C")], but computed from other values, of course. – Abbe 30/6, 2015 at 13:59

I see. So the other answers will be more helpful. I'll leave this answer here so that other people know what you are looking for. – Necroscopy 30/6, 2015 at 14:2

Here it is another solution that I have just found and it works well (and efficiently) when you have large list and therefore, big dataframes.

df <- data.table::rbindlist(l)
# add a column with the rownames
df[,Col := unlist(lapply(l, rownames))]
df <- df %>% dplyr::select(Col, everything())

> df
   Col x
1:   a 1
2:   b 2
3:   c 3
4:   d 4
5:   e 5
6:   f 6
7:   g 7
8:   h 8
9:   i 9

More info about rbindlist here.

Asterisk answered 3/10, 2022 at 7:54 Comment(0)

Recommended topics

Hot tags