Combine a list of data frames into one preserving row names
Asked Answered
A

4

11

I do know about the basics of combining a list of data frames into one as has been answered before. However, I am interested in smart ways to maintain row names. Suppose I have a list of data frames that are fairly equal and I keep them in a named list.

library(plyr)
library(dplyr)
library(data.table)

a = data.frame(x=1:3, row.names = letters[1:3])
b = data.frame(x=4:6, row.names = letters[4:6])
c = data.frame(x=7:9, row.names = letters[7:9])

l = list(A=a, B=b, C=c)

When I use do.call, the list names are combined with the row names:

> rownames(do.call("rbind", l))
[1] "A.a" "A.b" "A.c" "B.d" "B.e" "B.f" "C.g" "C.h" "C.i"

When I use any of rbind.fill, bind_rows or rbindlist the row names are replaced by a numeric range:

> rownames(rbind.fill(l))
> rownames(bind_rows(l))
> rownames(rbindlist(l))
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9"

When I remove the names from the list, do.call produces the desired output:

> names(l) = NULL
> rownames(do.call("rbind", l))
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i"

So is there a function that I'm missing that provides some finer control over the row names? I do need the names for a different context so removing them is sub-optimal.

Abbe answered 30/6, 2015 at 13:47 Comment(3)
Hadley, and thus the hadleyverse, does not really approve of rownames, so it's unlikely that you'll get any of those packages to preserve rownames.Raddatz
Using data.table you could maybe do rbindlist(lapply(l, setDT, keep.rownames = TRUE)) though not sure regarding efficiency.Tightfisted
Use dplyr::add_rownames()Warrant
T
15

To preserve rownames, you can simply do:

do.call(rbind, unname(l))

#  x
#a 1
#b 2
#c 3
#d 4
#e 5
#f 6
#g 7
#h 8
#i 9

Or as you underlined by setting the rownames of l to NULL , this can be also done by:

do.call(rbind, setNames(l, NULL))
Towle answered 30/6, 2015 at 13:53 Comment(3)
Sometimes life is so simple. Thank you.Abbe
You were almost there! At least you know setNames and unname !Towle
genius! great solutionDefilade
U
3

We can use add_rownames from dplyr package before binding:

rbind_all(lapply(l, add_rownames))

# Source: local data frame [9 x 2]
#
#   rowname x
# 1       a 1
# 2       b 2
# 3       c 3
# 4       d 4
# 5       e 5
# 6       f 6
# 7       g 7
# 8       h 8
# 9       i 9
Uhland answered 30/6, 2015 at 14:1 Comment(3)
It's good to know about add_rownames but doesn't serve me as well in my case.Abbe
@Abbe what if rownames overlap in a, b, and c ?Uhland
I know for certain that they won't in my scenario but it's a valid point, of course.Abbe
N
1

Why not only using rbind:

 rbind(l$A, l$B, l$C)
Necroscopy answered 30/6, 2015 at 13:54 Comment(2)
I can't use that approach in my real example because I am sub-setting the full list by a vector, something like l[c("A", "C")], but computed from other values, of course.Abbe
I see. So the other answers will be more helpful. I'll leave this answer here so that other people know what you are looking for.Necroscopy
A
0

Here it is another solution that I have just found and it works well (and efficiently) when you have large list and therefore, big dataframes.

df <- data.table::rbindlist(l)
# add a column with the rownames
df[,Col := unlist(lapply(l, rownames))]
df <- df %>% dplyr::select(Col, everything())

> df
   Col x
1:   a 1
2:   b 2
3:   c 3
4:   d 4
5:   e 5
6:   f 6
7:   g 7
8:   h 8
9:   i 9

More info about rbindlist here.

Asterisk answered 3/10, 2022 at 7:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.