cbind: is there a way to have missing values set to NA?
Asked Answered
D

2

16

Please forgive me if I missed an answer to such a simple question.

I want to use cbind() to bind two columns. One of them is a single entry shorter in length.

Can I have R supply an NA for the missing value?

The documentation discusses a deparse.level argument but this doesn't seem to be my solution.

Further, if I may be so bold, would there also be a quick way to prepend the shorter column with NA's?

Defray answered 29/9, 2013 at 3:41 Comment(0)
S
12

Try this:

x <- c(1:5)
y <- c(4:1)
length(y) = length(x)
cbind(x,y)
     x  y
[1,] 1  4
[2,] 2  3
[3,] 3  2
[4,] 4  1
[5,] 5 NA

or this:

x <- c(4:1)
y <- c(1:5)
length(x) = length(y)
cbind(x,y)
      x y
[1,]  4 1
[2,]  3 2
[3,]  2 3
[4,]  1 4
[5,] NA 5

I think this will do something similar to what DWin suggested and work regardless of which vector is shorter:

x <- c(4:1)
y <- c(1:5)

lengths <- max(c(length(x), length(y)))
length(x) <- lengths
length(y) <- lengths
cbind(x,y)

The code above can also be condensed to:

x <- c(4:1)
y <- c(1:5)
length(x) <- length(y) <- max(c(length(x), length(y)))
cbind(x,y)

EDIT

Here is what I came up with to address the question:

"Further, if I may be so bold, would there also be a quick way to prepend the shorter column with NA's?"

inserted into the original post by Matt O'Brien.

x <- c(4:1)
y <- c(1:5)

first <- 1   # 1 means add NA to top of shorter vector
             # 0 means add NA to bottom of shorter vector

if(length(x)<length(y)) {
     if(first==1) x = c(rep(NA, length(y)-length(x)),x);y=y
     if(first==0) x = c(x,rep(NA, length(y)-length(x)));y=y
} 

if(length(y)<length(x)) {
     if(first==1) y = c(rep(NA, length(x)-length(y)),y);x=x
     if(first==0) y = c(y,rep(NA, length(x)-length(y)));x=x
} 

cbind(x,y)

#       x y
# [1,] NA 1
# [2,]  4 2
# [3,]  3 3
# [4,]  2 4
# [5,]  1 5

Here is a function:

x <- c(4:1)
y <- c(1:5)

first <- 1   # 1 means add NA to top of shorter vector
             # 0 means add NA to bottom of shorter vector

my.cbind <- function(x,y,first) {

  if(length(x)<length(y)) {
     if(first==1) x = c(rep(NA, length(y)-length(x)),x);y=y
     if(first==0) x = c(x,rep(NA, length(y)-length(x)));y=y
  } 

  if(length(y)<length(x)) {
     if(first==1) y = c(rep(NA, length(x)-length(y)),y);x=x
     if(first==0) y = c(y,rep(NA, length(x)-length(y)));x=x
  } 

  return(cbind(x,y))

}

my.cbind(x,y,first)

my.cbind(c(1:5),c(4:1),1)
my.cbind(c(1:5),c(4:1),0)
my.cbind(c(1:4),c(5:1),1)
my.cbind(c(1:4),c(5:1),0)
my.cbind(c(1:5),c(5:1),1)
my.cbind(c(1:5),c(5:1),0)

This version allows you to cbind two vectors of different mode:

x <- c(4:1)
y <- letters[1:5]

first <- 1   # 1 means add NA to top of shorter vector
             # 0 means add NA to bottom of shorter vector

my.cbind <- function(x,y,first) {

  if(length(x)<length(y)) {
     if(first==1) x = c(rep(NA, length(y)-length(x)),x);y=y
     if(first==0) x = c(x,rep(NA, length(y)-length(x)));y=y
  } 

  if(length(y)<length(x)) {
     if(first==1) y = c(rep(NA, length(x)-length(y)),y);x=x
     if(first==0) y = c(y,rep(NA, length(x)-length(y)));x=x
  } 

  x <- as.data.frame(x)
  y <- as.data.frame(y)

  return(data.frame(x,y))

}

my.cbind(x,y,first)

#    x y
# 1 NA a
# 2  4 b
# 3  3 c
# 4  2 d
# 5  1 e

my.cbind(c(1:5),letters[1:4],1)
my.cbind(c(1:5),letters[1:4],0)
my.cbind(c(1:4),letters[1:5],1)
my.cbind(c(1:4),letters[1:5],0)
my.cbind(c(1:5),letters[1:5],1)
my.cbind(c(1:5),letters[1:5],0)
Spiritism answered 29/9, 2013 at 3:52 Comment(3)
@DWin Could you not just switch the order and use length(x) = length(y) if x was shorter?Spiritism
Sure, but you ought to use a test and then perform the correct action.Harriot
@DWin Okay. Yes, that would be best. I can try to create a function to do that unless you do it first.Spiritism
I
8

A while back I had put together a function called Cbind that was meant to do this sort of thing. In its current form, it should be able to handle vectors, data.frames, and matrices as the input.

For now, the function is here: https://gist.github.com/mrdwab/6789277

Here is how one would use the function:

x <- 1:5
y <- letters[1:4]
z <- matrix(1:4, ncol = 2, dimnames = list(NULL, c("a", "b")))
Cbind(x, y, z)
#   x    y z_a z_b
# 1 1    a   1   3
# 2 2    b   2   4
# 3 3    c  NA  NA
# 4 4    d  NA  NA
# 5 5 <NA>  NA  NA
Cbind(x, y, z, first = FALSE)
#   x    y z_a z_b
# 1 1 <NA>  NA  NA
# 2 2    a  NA  NA
# 3 3    b  NA  NA
# 4 4    c   1   3
# 5 5    d   2   4

The two three functions required are padNA, dotnames, and Cbind, which are defined as follows:

padNA <- function (mydata, rowsneeded, first = TRUE) {
## Pads vectors, data.frames, or matrices with NA
  temp1 = colnames(mydata)
  rowsneeded = rowsneeded - nrow(mydata)
  temp2 = setNames(
    data.frame(matrix(rep(NA, length(temp1) * rowsneeded), 
                      ncol = length(temp1))), temp1)
  if (isTRUE(first)) rbind(mydata, temp2)
  else rbind(temp2, mydata)
}

dotnames <- function(...) {
## Gets the names of the objects passed through ...
  vnames <- as.list(substitute(list(...)))[-1L]
  vnames <- unlist(lapply(vnames,deparse), FALSE, FALSE)
  vnames
}

Cbind <- function(..., first = TRUE) {
## cbinds vectors, data.frames, and matrices together
  Names <- dotnames(...)
  datalist <- setNames(list(...), Names)
  nrows <- max(sapply(datalist, function(x) 
    ifelse(is.null(dim(x)), length(x), nrow(x))))
  datalist <- lapply(seq_along(datalist), function(x) {
    z <- datalist[[x]]
    if (is.null(dim(z))) {
      z <- setNames(data.frame(z), Names[x])
    } else {
      if (is.null(colnames(z))) {
        colnames(z) <- paste(Names[x], sequence(ncol(z)), sep = "_")
      } else {
        colnames(z) <- paste(Names[x], colnames(z), sep = "_")
      }
    }
    padNA(z, rowsneeded = nrows, first = first)
  })
  do.call(cbind, datalist)
}

Part of the reason I stopped working on the function was that the gdata package already has a function called cbindX that handles cbinding data.frames and matrices with different numbers of rows. It will not work directly on vectors, so you need to convert them to data.frames first.

library(gdata)
cbindX(data.frame(x), data.frame(y), z)
#   x    y  a  b
# 1 1    a  1  3
# 2 2    b  2  4
# 3 3    c NA NA
# 4 4    d NA NA
# 5 5 <NA> NA NA
Ikon answered 29/9, 2013 at 12:34 Comment(1)
+1 for mentioning cbindX - works very well. Here is the codePeebles

© 2022 - 2024 — McMap. All rights reserved.