Apply a function to a subset of data.table columns, by column-indices instead of name

About

Asked 28/5, 2013 at 3:51 Answered 28/5, 2013 at 4:17

Solved r data.table multiple-columns indices

I'm trying to apply a function to a group of columns in a large data.table without referring to each one individually.

a <- data.table(
  a=as.character(rnorm(5)),
  b=as.character(rnorm(5)),
  c=as.character(rnorm(5)),
  d=as.character(rnorm(5))
)
b <- c('a','b','c','d')

with the MWE above, this:

a[,b=as.numeric(b),with=F]

works, but this:

a[,b[2:3]:=data.table(as.numeric(b[2:3])),with=F]

doesn't work. What is the correct way to apply the as.numeric function to just columns 2 and 3 of a without referring to them individually.

(In the actual data set there are tens of columns so it would be impractical)

Rosannarosanne answered 28/5, 2013 at 3:51 Comment(1)

Also, if you just want to reference multiple columns by indices, ,with=F] allows j to be column-indices e.g. dt[, 2:3, with =F. But applying a function to each is more complicated, as per @mnel's answer. – Clathrate 27/4, 2018 at 6:31

The idiomatic approach is to use .SD and .SDcols

You can force the RHS to be evaluated in the parent frame by wrapping in ()

a[, (b) := lapply(.SD, as.numeric), .SDcols = b]

For columns 2:3

a[, 2:3 := lapply(.SD, as.numeric), .SDcols = 2:3]

mysubset <- 2:3
a[, (mysubset) := lapply(.SD, as.numeric), .SDcols = mysubset]

Ungava answered 28/5, 2013 at 4:17 Comment(4)

If you want to use the "by" grouping here, does that have to be included in advance, in mysubset? – Libration 7/5, 2014 at 1:34

@TrevorAlexander - No, the By columns are not in .SD, they exist as single values in the environment in which .SD is created. – Ungava 7/5, 2014 at 1:49

Hi how do i use this if I want to apply the function on all columns but 'b'? Thanks! – Decease 23/11, 2017 at 12:49

@Decease You could still use a[, (b[b != 'b']) := lapply(.SD, as.numeric), .SDcols = b[b != 'b']] then. But mySubset <- setdiff(b, 'b') followed by a[, (mySubset) := lapply(.SD, as.numeric), .SDcols = mySubset] is more readable and seems straight-forward – Bipack 26/1, 2018 at 14:59

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags