As noted by Frank, the problem is that there are (somewhat invisibly) several different types of NA
. The one produced when you type NA
at the command line is of class "logical"
, but there are also NA_integer_
, NA_real_
, NA_character_
, and NA_complex_
.
In your first example, the initial data.table
sets the class of column b
to "character", and the NA
in the second data.table
is then coerced to an NA_character_
. In the second example, though, the NA
in the first data.table
sets column b
's class to "logical", and, when the same column in the second data.table is coerced to "logical", it's converted to a logical NA. (Try as.logical("x")
to see why.)
That's all fairly complicated (to articulate, at least), but there is a reasonably simple solution. Just create a 1-row template data.table
, and prepend it to each list of data.table
's you want to rbind()
. It will establish the class of each column to be what you want, regardless of what data.table
's follow it in the list passed to rbind()
, and can be trimmed off once everything else is bound together.
library(data.table)
## The two lists of data.tables from the OP
A <- list(data.table(x=1, b='x'),data.table(x=1, b=NA))
B <- list(data.table(x=1, b=NA),data.table(x=1, b='x'))
## A 1-row template, used to set the column types (and then removed)
DT <- data.table(x=numeric(1), b=character(1))
## Test it out
do.call(rbind, c(list(DT), A))[-1,]
# x b
# 1: 1 x
# 2: 1 NA
do.call(rbind, c(list(DT), B))[-1,]
# x b
# 1: 1 NA
# 2: 1 x
## Finally, as _also_ noted by Frank, rbindlist will likely be more efficient
rbindlist(c(list(DT), B)[-1,]
NA
is logical andas.logical('x')=NA
, so whenrbind
decides that that column is logical (based on its first argument), it coerces subsequent items to match.do.call(rbind, list(data.table(x=1, b=as(NA,'character')),data.table(x=1, b='x')))
works... – Mcgheedo.call(rbind,...)
" for data.tables calledrbindlist
. There are a few q's about it on this site, e.g., #15674050 – Mcgheerbindlist
to my answer. – Hearst