R data.table apply function to rows using columns as arguments
Asked Answered
S

4

35

I have the following data.table

x = structure(list(f1 = 1:3, f2 = 3:5), .Names = c("f1", "f2"), row.names = c(NA, -3L), class = c("data.table", "data.frame"))

I would like to apply a function to each row of the data.table. The function func.test uses args f1 and f2 and does something with it and returns a computed value. Assume (as an example)

func.text <- function(arg1,arg2){ return(arg1 + exp(arg2))}

but my real function is more complex and does loops and all, but returns a computed value. What would be the best way to accomplish this?

Spragens answered 21/8, 2014 at 16:26 Comment(0)
E
55

The best way is to write a vectorized function, but if you can't, then perhaps this will do:

x[, func.text(f1, f2), by = seq_len(nrow(x))]
Epact answered 21/8, 2014 at 17:3 Comment(11)
Ah, didn't think of using <code>by = 1:nrow(x)</code> trick. Nice oneSpragens
Not sure why not just use .I, e.g., something like x[, func.text(f1, f2), by = .I]Jeunesse
@DavidArenburg I have no idea what by=.I is doing. It's somehow not quite the same as by=1:nrow(x), as you can check by comparing e.g. x[, 1, by = .I] and x[, 1, by = 1:nrow(x)].Epact
would be great though if that worked as you'd expect it to work (also by=1:.N)Epact
Yeah you probably right, but in this case it doesn't even look like the OP needs a by statement here, as his function already operates over the whole data set by row, so even x[, func.text(f1, f2)] will give the desired result. The problem will be that it will lose the data.table class and become a numeric vector. Adding by = .I will keep the class, but I'm not sure why or how (I'll probably will get some angry comment from @Arun pointing out my lack of understanding in data.table soon)Jeunesse
Hmm.. Could you explain why writing vectorized function would be better than this? To me this looks very clean and easy. (cleaner and easier than vectorizing the function).Theis
@Theis having a vectorized function would result in a lot fewer function calls and would be fasterEpact
This breaks when x has zero rows.Hilleary
@JamesHirschorn if you have zero rows, then you're solving a different problem from "applying function to rows"Epact
@Epact I disagree. I have production code that needs to apply a function to the rows of some given data.table, but the # of rows of the data.table is not known in advance and could potentially be zero.Hilleary
@JamesHirschorn that doesn't change that it's a different problem, but I'm too lazy to keep arguing this minor point - new edit will be fineEpact
T
30

The most elegant way I've found is with mapply:

x[, value := mapply(func.text, f1, f2)]
x
#    f1 f2    value
# 1:  1  3 21.08554
# 2:  2  4 56.59815
# 3:  3  5 151.4132

Or with the purrr package:

x[, value := purrr::pmap_dbl(.(f1, f2), func.text)]

If your situation allows for it, another approach would be to match the arguments names to the column names to use:

library("purrr")

# arguments match the names of the columns, dots collect other 
# columns existing in the data.table
func.text <- function(f1, f2, ...) { return(f1 + exp(f2)) }

# use `set` to modify the data.table by reference
purrr::pmap_dbl(x, func.text) %>%
  data.table::set(x, i = NULL, j = "value", value = .)

print(x)
##    f1 f2     value
## 1:  1  3  21.08554
## 2:  2  4  56.59815
## 3:  3  5 151.41316
Tedesco answered 13/4, 2017 at 17:59 Comment(0)
W
9

We can define rows with .I function.

dt_iris <- data.table(iris)
dt_iris[, ..I := .I]

## Let's define some function
some_fun <- function(dtX) {
    print('hello')
    return(dtX[, Sepal.Length / Sepal.Width])
}

## by row
dt_iris[, some_fun(.SD), by = ..I] # or simply: dt_iris[, some_fun(.SD), by = .I]

## vectorized calculation
some_fun(dt_iris) 
Warbeck answered 24/9, 2015 at 11:33 Comment(8)
I am under the impression there was an age it was possible to directly use by=.I in the third component. No ?Hsinking
@StéphaneLaurent sure, it is just to indicate that user sees the data, he applies by on. I have updated post to remove any doubt ;)Warbeck
Sorry CronAcronis, maybe my comment is not clear. I mean it was possible to direclty do dt[, y:=somefun(x), by=I] in the past. But it is no possible now. Or maybe my memory is wrong.Hsinking
@StéphaneLaurent I think you meant .I, so you can do dt_iris[, some_fun(.SD), by = .I], with dot.Warbeck
Yes sorry, I meant .I. But I tried it yesterday and it didn't work... Hmm I have just tried now and it works.. Sorry, I was surely too tired :)Hsinking
What's the meaning of ..I ?Nitz
@Nitz just for convenience to have actual counter persisted, no special meaning.Warbeck
Note that .I is meant to be used as a j argument in data.table, and not in the by clause. In DT >1.12.4 it doesn't seem to work either. @CronMerdek, can you re-evaluate your answer?Puccini
P
0

This is a pretty compact syntax

x[, c := .(Map(func.text, f1, f2))]
Pillow answered 19/5, 2023 at 0:5 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.