Combining (cbind) vectors of different length
Asked Answered
D

6

37

I have several vectors of unequal length and I would like to cbind them. I've put the vectors into a list and I have tried to combine the using do.call(cbind, ...):

nm <- list(1:8, 3:8, 1:5)
do.call(cbind, nm)

#      [,1] [,2] [,3]
# [1,]    1    3    1
# [2,]    2    4    2
# [3,]    3    5    3
# [4,]    4    6    4
# [5,]    5    7    5
# [6,]    6    8    1
# [7,]    7    3    2
# [8,]    8    4    3
# Warning message:
#   In (function (..., deparse.level = 1)  :
#         number of rows of result is not a multiple of vector length (arg 2)

As expected, the number of rows in the resulting matrix is the length of the longest vector, and the values of the shorter vectors are recycled to make up for the length.

Instead I'd like to pad the shorter vectors with NA values to obtain the same length as the longest vector. I'd like the matrix to look like this:

#      [,1] [,2] [,3]
# [1,]    1    3    1
# [2,]    2    4    2
# [3,]    3    5    3
# [4,]    4    6    4
# [5,]    5    7    5
# [6,]    6    8    NA
# [7,]    7    NA   NA
# [8,]    8    NA   NA

How can I go about doing this?

Darlleen answered 3/4, 2011 at 18:12 Comment(1)
flash of brilliance: nm <- cbind( z1, c(z2, rep(NA,length(z1)-length(z2))) )Darlleen
D
35

You can use indexing, if you index a number beyond the size of the object it returns NA. This works for any arbitrary number of rows defined with foo:

nm <- list(1:8,3:8,1:5)

foo <- 8

sapply(nm, '[', 1:foo)

EDIT:

Or in one line using the largest vector as number of rows:

sapply(nm, '[', seq(max(sapply(nm,length))))

From R 3.2.0 you may use lengths ("get the length of each element of a list") instead of sapply(nm, length):

sapply(nm, '[', seq(max(lengths(nm))))
Decoy answered 4/4, 2011 at 0:59 Comment(3)
'[' is the name of the operator [ which you use in indexing (foo[1:10]). See also ?'['Decoy
The one line solution fails if the first column is shorter than the other two.Selfdenial
The only answer that keeps column name is from @Ronak Shah using the rowr package. Is there an alternative with base R that keeps column names?Shayne
G
8

You should fill vectors with NA before calling do.call.

nm <- list(1:8,3:8,1:5)

max_length <- max(unlist(lapply(nm,length)))
nm_filled <- lapply(nm,function(x) {ans <- rep(NA,length=max_length);
                                    ans[1:length(x)]<- x;
                                    return(ans)})
do.call(cbind,nm_filled)
Gallager answered 3/4, 2011 at 18:59 Comment(0)
E
3

This is a shorter version of Wojciech's solution.

nm <- list(1:8,3:8,1:5)
max_length <- max(sapply(nm,length))
sapply(nm, function(x){
    c(x, rep(NA, max_length - length(x)))
})
Euchologion answered 3/4, 2011 at 19:40 Comment(3)
You are always better off using vapply rather than sapply because that will guarantee you get the output type that you expect.Woo
@Woo Could you elaborate on your comment? I don't understand the difference between vapply and sapply transferred to this problem.Warrington
sapply is dangerous to program with because it is not type stable - depending on the length of nm you'll get different typesWoo
J
3

Here is an option using stri_list2matrix from stringi

library(stringi)
out <- stri_list2matrix(nm)
class(out) <- 'numeric'
out
#      [,1] [,2] [,3]
#[1,]    1    3    1
#[2,]    2    4    2
#[3,]    3    5    3
#[4,]    4    6    4
#[5,]    5    7    5
#[6,]    6    8   NA
#[7,]    7   NA   NA
#[8,]    8   NA   NA
Juju answered 28/12, 2018 at 9:17 Comment(0)
A
2

Late to the party but you could use cbind.fill from rowr package with fill = NA

library(rowr)
do.call(cbind.fill, c(nm, fill = NA))

#  object object object
#1      1      3      1
#2      2      4      2
#3      3      5      3
#4      4      6      4
#5      5      7      5
#6      6      8     NA
#7      7     NA     NA
#8      8     NA     NA

If you have a named list instead and want to maintain the headers you could use setNames

nm <- list(a = 1:8, b = 3:8, c = 1:5)
setNames(do.call(cbind.fill, c(nm, fill = NA)), names(nm))

#  a  b  c
#1 1  3  1
#2 2  4  2
#3 3  5  3
#4 4  6  4
#5 5  7  5
#6 6  8 NA
#7 7 NA NA
#8 8 NA NA
About answered 22/12, 2018 at 7:32 Comment(0)
P
1

You have to bring all list elements to the same length using length<- and then you can use cbind to get a matrix.

nm <- list(1:8, 3:8, 1:5)

do.call(cbind, lapply(nm, `length<-`, max(lengths(nm))))
#     [,1] [,2] [,3]
#[1,]    1    3    1
#[2,]    2    4    2
#[3,]    3    5    3
#[4,]    4    6    4
#[5,]    5    7    5
#[6,]    6    8   NA
#[7,]    7   NA   NA
#[8,]    8   NA   NA

Benchmark

nm <- list(1:8, 3:8, 1:5)

bench::mark(
"[" = sapply(nm, '[', seq(max(lengths(nm)))),
"length<-" = do.call(cbind, lapply(nm, `length<-`, max(lengths(nm)))) )
#  express…¹     min  median itr/s…² mem_a…³ gc/se…⁴ n_itr  n_gc total…⁵ result  
#  <bch:exp> <bch:t> <bch:t>   <dbl> <bch:b>   <dbl> <int> <dbl> <bch:t> <list>  
#1 [         36.19µs 40.56µs  24412.      0B    12.2  9995     5 409.4ms <int[…]>
#2 length<-   8.63µs  9.88µs 100367.      0B    20.1  9998     2  99.6ms <int[…]>

Using length<- is in this case about 4 times faster than [.

Pyrenees answered 18/4, 2023 at 8:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.