data.table objects assigned with := from within function not printed
Asked Answered
C

2

49

I would like to modify a data.table within a function. If I use the := feature within the function, the result is only printed for the second call.

Look at the following illustration:

library(data.table)
mydt <- data.table(x = 1:3, y = 5:7)

myfunction <- function(dt) {
    dt[, z := y - x]
    dt
}

When I call only the function, the table is not printed (which is the standard behaviour. However, if I save the returned data.table into a new object, it is not printed at the first call, only for the second one.

myfunction(mydt)  # nothing is printed   
result <- myfunction(mydt) 
result  # nothing is printed
result  # for the second time, the result is printed
mydt                                                                     
#    x y z
# 1: 1 5 4
# 2: 2 6 4
# 3: 3 7 4 

Could you explain why this happens and how to prevent it?

Choking answered 7/10, 2015 at 9:5 Comment(0)
C
56

There was a bug fixed in the version 1.9.6 which introduced this downside (see NEWS 1.9.6, BUG FIXES #1).

One should call DT[] at the end of the function to prevent this behaviour.

myfunction <- function(dt) {
    dt[, z := y - x][]
}
myfunction(mydt)  # prints immediately
#    x y z
# 1: 1 5 4
# 2: 2 6 4
# 3: 3 7 4 

This is described in data.table FAQ 2.23:

Why do I have to type DT sometimes twice after using := to print the result to console?

This is an unfortunate downside to get #869 to work. If a := is used inside a function with no DT[] before the end of the function, then the next time DT is typed at the prompt, nothing will be printed. A repeated DT will print. To avoid this: include a DT[] after the last := in your function. If that is not possible (e.g., it's not a function you can change) then print(DT) and DT[] at the prompt are guaranteed to print. As before, adding an extra [] on the end of := query is a recommended idiom to update and then print; e.g.> DT[,foo:=3L][].

Choking answered 7/10, 2015 at 10:0 Comment(1)
DT[] is only needed when print of data.table is suppressed, so when using := or set* functionsStrove
N
-3

I'm sorry if I'm not supposed to post something here that's not an answer, but my post is too long for a comment.

I'd like to point out that janosdivenyi's solution of adding a trailing [] to dt does not always give the expected results (even when using data.table 1.9.6 or 1.10.4) as I do below.

The examples below show that if dt is the last line in the function one gets the desired behaviour without the presence of the trailing [], but if dt is not on the last line in the function then a trailing [] is needed to get the desired behaviour.

The first example shows that with no trailing [] on dt we get the expected behaviour when dt is on the last line of the function

mydt <- data.table(x = 1:3, y = 5:7)

myfunction <- function(dt) {
  df <- 1
  dt[, z := y - x]
}

myfunction(mydt)  # Nothing printed as expected

mydt  # Content printed as desired
##    x y z
## 1: 1 5 4
## 2: 2 6 4
## 3: 3 7 4

Adding a trailing [] on dt gives unexpected behaviour

mydt <- data.table(x = 1:3, y = 5:7)

myfunction <- function(dt) {
  df <- 1
  dt[, z := y - x][]
}

myfunction(mydt)  # Content printed unexpectedly
##    x y z
## 1: 1 5 4
## 2: 2 6 4
## 3: 3 7 4

mydt  # Content printed as desired
##    x y z
## 1: 1 5 4
## 2: 2 6 4
## 3: 3 7 4

Moving df <- 1 to after the dt with no trailing [] gives unexpected behaviour

mydt <- data.table(x = 1:3, y = 5:7)

myfunction <- function(dt) {
  dt[, z := y - x]
  df <- 1
}

myfunction(mydt)  # Nothing printed as expected

mydt  # Nothing printed unexpectedly

Moving df <- 1 after the dt with a trailing [] gives the expected behaviour

mydt <- data.table(x = 1:3, y = 5:7)

myfunction <- function(dt) {
  dt[, z := y - x][]
  df <- 1
}

myfunction(mydt)  # Nothing printed as expected

mydt  # Content printed as desired
##    x y z
## 1: 1 5 4
## 2: 2 6 4
## 3: 3 7 4
Nil answered 19/3, 2017 at 19:56 Comment(2)
I think you're partly confusing how functions work. All functions return a value. If you don't write an explicit return(x) statement, then the last value in the function is returned. df <- 1 returns the value 1 invisibly, while DT[, x := y][]` returns DT, printed.Constable
Thanks for that explanation. I didn't realise. I guess it's the "return invisibly" bit that got me. I've also been confused by the "copy by reference" aspect of data tables. I spent ages playing with those examples trying to make sense of them. You now see why I don't answer questions on this forum :-)Nil

© 2022 - 2024 — McMap. All rights reserved.