R data.table ':=' works in direct call, but same function in a package fails
Asked Answered
W

2

14

Using R's data.table package,

This works:

instruction = "a = data.table(name=1:3, value=1:3, blah=1:3); a[,c('value', 'blah'):=NULL]"
eval(parse(text=instruction))
#   name
#1:    1
#2:    2
#3:    3

This works:

myFunc = function(instruction) {
eval(parse(text=instruction))
}
myFunc(instruction)
#   name
#1:    1
#2:    2
#3:    3

Now, put this function into a package, load it, and try to call it. This doesn't work:

myFuncInPackage(instruction)
#Error in `:=`(c("value", "blah"), NULL) : 
#  Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":=").

Why?


EDIT: @Roland points out that adding data.table in the package Depends field makes it work. However, I don't think this is a great solution because the package doesn't really depend on, require, or use data.table. I just want to be able to use data.table with the package.

In addition, everything else with data.table works fine in the function, just not the := operator.

So I guess a followup question could be: should I add data.table to the Depends of every package I write, so that data.tables work as expected within functions of that package? This doesn't seem right... what is the correct way to approach this?

Wizen answered 16/1, 2015 at 9:30 Comment(8)
Have you followed the advice in FAQ 6.9? Also, use of eval(parse()) is discouraged from.Semiweekly
@Semiweekly Add data.table to Depends solves it... but leads to an issue: my package doesn't actually depend on data.table; in fact, it's totally unrelated. As in this example, it just has one function, myFunc -- no data.table anything. But it can't be used with data.table without adding it to Depends...Wizen
@Roland, I know, eval(parse()) is discouraged, and this is a pointless example, but the question still stands...in some cases I can't get around it.Wizen
Your package has eval(parse(text=instruction)) where instruction can be anything! At the time of evaluation any function required by instruction must be available; this should be specified in the usage instructions for your package. You're seeing this when instruction requires a function in data.table; load 'data.table' before executing myFuncInPackage(instruction) and see if it works.Punchy
The := operator that you use in your function is defined within data.table package, so yes, your package does depend on data.tableHanker
@ Sergii Zaskaleta No... I didn't use := in my function. That was passed by the user, in the "instruction" variable. it has nothing to do with the package...Wizen
@sheffien can you check if you did update your NAMESPACE file to import(data.table) and DESCRIPTION to Imports: data.table? I got the same problem recently just because missing entry in NAMESPACE file.Spectrometer
@Moody_Mudskipper Please see meta.#377093Cyrano
W
7

I've finally figured out the answer to this question (after several years). All comments and answers suggested adding data.table to Depends or Imports, but this is incorrect; the package does not depend on data.table and, that could be any package hypothetically, not just data.table, meaning taken to logical conclusion, the suggestion would require adding all possible packages to Depends -- since that dependency is provided by the user providing the instruction, not by the function provided by the package.

Instead, basically, it's because call to eval is done within the namespace of the package, and this does not include the functions provided by other packages. I ultimately solved this by specifying the global environment in the eval call:

myFunc = function(instruction) {
eval(parse(text=instruction), envir=globalenv())
}

Why this works

This causes the eval function to be done in the environment that will include the requisite packages in the search path.

In the data.table case it's particularly hard to debug because of the complexity of the function overloading. In this case, the culprit is not actually the := function, but the [ function. The := error is a red herring. At the time of writing, the := function in data.table is defined like this:

https://github.com/Rdatatable/data.table/blob/348c0c7fdb4987aa6da99fc989431d8837877ce4/R/data.table.R#L2561

":=" <- function(...) stop('Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":=").')

That's it. What that means: any call to := as a function is stopped with an error message, because this is not how the authors intend := to be used. Instead, := is really just keyword that's interpreted by the [ function in data.table.

But what happens here: if the [ function isn't correctly mapped to the version specified by data.table, and instead is mapped to the base [, then we have a problem -- since it can't handle := and so it's getting treated as a function and triggering the error message. So the culprit function is [.data.table -- the overloaded bracket operator.

What's happening is in my new package (that holds myFuncInPackage), when it goes to evaluate the code, it resolves the [ function to the base [ function instead of to data.table's [ function. It tries to evaluate := as a function, which is not being consumed by the [ since it's not the correct [, so := is getting passed as a function instead of as a value to data.table's, because data.table is not in the namespace (or is lower in the search() hierarchy. In this setting, := is not understood and so it's being evaluated as a function, thus triggering the error message in the data.table code above.

When you specify the eval to happen in the global environment, it correctly resolves the [ function to [.data.table, and the := is interpreted correctly.

Incidentally, you can also use this if you're passing not a character string but a code block (better) to eval() inside a package:

eval(substitute(instruction), envir=globalenv())

Here, substitute prevents the instruction from being parsed (incorrectly) within the package namespace at the argument-eval stage, so that it makes it intact back to the globalenv where it can be correctly evaluated with the required functions in place.

Wizen answered 14/8, 2017 at 21:22 Comment(0)
M
8

I had same problem and I solved it adding data.table to Imports and Depends:. My data.table version is 1.9.6

Monoatomic answered 17/2, 2016 at 15:47 Comment(7)
Can you give an example? - - I have script which sources a function which uses data.table. I get the error here. I include library(data.table) in the script and/or in the function itself. - - Can you also give an example how you apply Imports and Depends: here to solve the problem. My data.table is 1.10.4.Powel
It worked for me in R package context - not raw script. But answering to your question - you can apply it in DESCRIPTION file: Imports: data.table (>= 1.9.6) Depends: data.table (>= 1.9.6) , e.g.: pastebin.com/uy10DevhMonoatomic
Can you prevent packages to be loaded by such specifications? Etc imports data.table but prevent reshape2 to be loaded as an own package.Powel
You can load only specific functions from package by using @import, e.g.: @importFrom jsonlite toJSON unbox. Read more here: kbroman.org/pkg_primer/pages/depends.htmlMonoatomic
@Monoatomic you can do this, but it doesn't solve the underlying problem; it only solves it for this package, not for other potential issues. See my new answer for a universal solution.Wizen
In my case, it was enough to just mention Imports: data.table in the DESCRIPTION file. Even more, mentioning again in Depends: section would trigger a note from devtools::check() - Package listed in more than one of Depends, Imports, Suggests, Enhances: data.table A package should be listed in only one of these fields.Knurled
I added data.table in depends and it works finally, removed from imports. This is extremely annoying, wasted 3 hours on this. Yikes.Grackle
W
7

I've finally figured out the answer to this question (after several years). All comments and answers suggested adding data.table to Depends or Imports, but this is incorrect; the package does not depend on data.table and, that could be any package hypothetically, not just data.table, meaning taken to logical conclusion, the suggestion would require adding all possible packages to Depends -- since that dependency is provided by the user providing the instruction, not by the function provided by the package.

Instead, basically, it's because call to eval is done within the namespace of the package, and this does not include the functions provided by other packages. I ultimately solved this by specifying the global environment in the eval call:

myFunc = function(instruction) {
eval(parse(text=instruction), envir=globalenv())
}

Why this works

This causes the eval function to be done in the environment that will include the requisite packages in the search path.

In the data.table case it's particularly hard to debug because of the complexity of the function overloading. In this case, the culprit is not actually the := function, but the [ function. The := error is a red herring. At the time of writing, the := function in data.table is defined like this:

https://github.com/Rdatatable/data.table/blob/348c0c7fdb4987aa6da99fc989431d8837877ce4/R/data.table.R#L2561

":=" <- function(...) stop('Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":=").')

That's it. What that means: any call to := as a function is stopped with an error message, because this is not how the authors intend := to be used. Instead, := is really just keyword that's interpreted by the [ function in data.table.

But what happens here: if the [ function isn't correctly mapped to the version specified by data.table, and instead is mapped to the base [, then we have a problem -- since it can't handle := and so it's getting treated as a function and triggering the error message. So the culprit function is [.data.table -- the overloaded bracket operator.

What's happening is in my new package (that holds myFuncInPackage), when it goes to evaluate the code, it resolves the [ function to the base [ function instead of to data.table's [ function. It tries to evaluate := as a function, which is not being consumed by the [ since it's not the correct [, so := is getting passed as a function instead of as a value to data.table's, because data.table is not in the namespace (or is lower in the search() hierarchy. In this setting, := is not understood and so it's being evaluated as a function, thus triggering the error message in the data.table code above.

When you specify the eval to happen in the global environment, it correctly resolves the [ function to [.data.table, and the := is interpreted correctly.

Incidentally, you can also use this if you're passing not a character string but a code block (better) to eval() inside a package:

eval(substitute(instruction), envir=globalenv())

Here, substitute prevents the instruction from being parsed (incorrectly) within the package namespace at the argument-eval stage, so that it makes it intact back to the globalenv where it can be correctly evaluated with the required functions in place.

Wizen answered 14/8, 2017 at 21:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.