Non-standard evaluation in a user-defined function with lapply or with in R
Asked Answered
S

3

4

I wrote a wrapper around ftable because I need to compute flat tables with frequency and percentage for many variables. As ftable method for class "formula" uses non-standard evaluation, the wrapper relies on do.call and match.call to allow the use of the subset argument of ftable (more details in my previous question).

mytable <- function(...) {
    do.call(what = ftable,
            args = as.list(x = match.call()[-1]))
    # etc
}

However, I cannot use this wrapper with lapply nor with:

# example 1: error with "lapply"
lapply(X = warpbreaks[c("breaks",
                        "wool",
                        "tension")],
       FUN = mytable,
       row.vars = 1)

Error in (function (x, ...)  : object 'X' not found

# example 2: error with "with"
with(data = warpbreaks[warpbreaks$tension == "L", ],
     expr = mytable(wool))

Error in (function (x, ...)  : object 'wool' not found

These errors seem to be due to match.call not being evaluated in the right environment.

As this question is closely linked to my previous one, here is a sum up of my problems:

  • The wrapper with do.call and match.call cannot be used with lapply or with.
  • The wrapper without do.call and match.call cannot use the subset argument of ftable.

And a sum up of my questions:

  • How can I write a wrapper which allows both to use the subset argument of ftable and to be used with lapply and with? I have ideas to avoid the use of lapply and with, but I am looking to understand and correct these errors to improve my knowledge of R.
  • Is the error with lapply related to the following note from ?lapply?

    For historical reasons, the calls created by lapply are unevaluated, and code has been written (e.g., bquote) that relies on this. This means that the recorded call is always of the form FUN(X[[i]], ...), with i replaced by the current (integer or double) index. This is not normally a problem, but it can be if FUN uses sys.call or match.call or if it is a primitive function that makes use of the call. This means that it is often safer to call primitive functions with a wrapper, so that e.g. lapply(ll, function(x) is.numeric(x)) is required to ensure that method dispatch for is.numeric occurs correctly.

Steersman answered 23/4, 2019 at 14:48 Comment(4)
@RLave Thank you for your comment. I have heavely edited my question. Hope it helps !Steersman
@Swolf Thank you, but I get the same error. Not you ?Steersman
Am I correct that you're trying to pass all the arguments passed to mytable into ftable?Aksum
@Aksum Yes, exactly !Steersman
S
0

Thanks to this issue, the wrapper became:

# function 1
mytable <- function(...) {
    do.call(what = ftable,
            args = as.list(x = match.call()[-1]),
            envir = parent.frame())
    # etc
}

Or:

# function 2
mytable <- function(...) {
    mc <- match.call()
    mc[[1]] <- quote(expr = ftable)
    eval.parent(expr = mc)
    # etc
}

I can now use the subset argument of ftable, and use the wrapper in lapply:

lapply(X = warpbreaks[c("wool",
                        "tension")],
       FUN = function(x) mytable(formula = x ~ breaks,
                                 data = warpbreaks,
                                 subset = breaks < 15))

However I do not understand why I have to supply envir = parent.frame() to do.call as it is a default argument.

More importantly, these methods do not resolve another issue: I can not use the subset argument of ftable with mapply.

Steersman answered 24/4, 2019 at 11:6 Comment(1)
I've posted an answer that hopefully explains what's happening here in more depth, though you basically figured it out already yourself. For your "bonus questions", you should ask new questions for them – the StackOverflow format is made for 1 question per questionAksum
A
3

The problem with using match.call with lapply is that match.call returns the literal call that passed into it, without any interpretation. To see what's going on, let's make a simpler function which shows exactly how your function is interpreting the arguments passed into it:

match_call_fun <- function(...) {
    call = as.list(match.call()[-1])
    print(call)
}

When we call it directly, match.call correctly gets the arguments and puts them in a list that we can use with do.call:

match_call_fun(iris['Species'], 9)

[[1]]
iris["Species"]

[[2]]
[1] 9

But watch what happens when we use lapply (I've only included the output of the internal print statement):

lapply('Species', function(x) match_call_fun(iris[x], 9))

[[1]]
iris[x]

[[2]]
[1] 9

Since match.call gets the literal arguments passed to it, it receives iris[x], not the properly interpreted iris['Species'] that we want. When we pass those arguments into ftable with do.call, it looks for an object x in the current environment, and then returns an error when it can't find it. We need to interpret

As you've seen, adding envir = parent.frame() fixes the problem. This is because, adding that argument tells do.call to evaluate iris[x] in the parent frame, which is the anonymous function in lapply where x has it's proper meaning. To see this in action, let's make another simple function that uses do.call to print ls from 3 different environmental levels:

z <- function(...) {
    print(do.call(ls, list()))
    print(do.call(ls, list(), envir = parent.frame()))
    print(do.call(ls, list(), envir = parent.frame(2)))
}

When we call z() from the global environment, we see the empty environment inside the function, then the Global Environment:

z()

character(0)                                  # Interior function environment
[1] "match_call_fun" "y"              "z"     # GlobalEnv
[1] "match_call_fun" "y"              "z"     # GlobalEnv

But when we call from within lapply, we see that one level of parent.frame up is the anonymous function in lapply:

lapply(1, z)

character(0)                                  # Interior function environment
[1] "FUN" "i"   "X"                           # lapply
[1] "match_call_fun" "y"              "z"     # GlobalEnv

So, by adding envir = parent.frame(), do.call knows to evaluate iris[x] in the lapply environment where it knows that x is actually 'Species', and it evaluates correctly.

mytable_envir <- function(...) {
    tab <- do.call(what = ftable,
                   args = as.list(match.call()[-1]),
                   envir = parent.frame())
    prop <- prop.table(x = tab,
                       margin = 2) * 100
    bind <- cbind(as.matrix(x = tab),
                  as.matrix(x = prop))
    margin <- addmargins(A = bind,
                         margin = 1)
    round(x = margin,
          digits = 1)
}



# This works!
lapply(X = c("breaks","wool","tension"),
       FUN = function(x) mytable_envir(warpbreaks[x],row.vars = 1))

As for why adding envir = parent.frame() makes a difference since that appears to be the default option. I'm not 100% sure, but my guess is that when the default argument is used, parent.frame is evaluated inside the do.call function, returning the environment in which do.call is run. What we're doing, however, is calling parent.frame outside do.call, which means it returns one level higher than the default version.

Here's a test function that takes parent.frame() as a default value:

fun <- function(y=parent.frame()) {
    print(y)
    print(parent.frame())
    print(parent.frame(2))
    print(parent.frame(3))
}

Now look at what happens when we call it from within lapply both with and without passing in parent.frame() as an argument:

lapply(1, function(y) fun())
<environment: 0x12c5bc1b0>     # y argument
<environment: 0x12c5bc1b0>     # parent.frame called inside
<environment: 0x12c5bc760>     # 1 level up = lapply
<environment: R_GlobalEnv>     # 2 levels up = globalEnv

lapply(1, function(y) fun(y = parent.frame()))
<environment: 0x104931358>     # y argument
<environment: 0x104930da8>     # parent.frame called inside
<environment: 0x104931358>     # 1 level up = lapply
<environment: R_GlobalEnv>     # 2 levels up = globalEnv

In the first example, the value of y is the same as what you get when you call parent.frame() inside the function. In the second example, the value of y is the same as the environment one level up (inside lapply). So, while they look the same, they're actually doing different things: in the first example, parent.frame is being evaluated inside the function when it sees that there is no y= argument, in the second, parent.frame is evaluated in the lapply anonymous function first, before calling fun, and then is passed into it.

Aksum answered 24/4, 2019 at 15:35 Comment(5)
Thank you very much for this detailed answer. It helps me to understand more in depth. However, about why parent.frame() is needed although it is the default argument, I don't see why the behavior of the default argument would be different as if the same argument were specified manually...Steersman
Thank you very much again, for your edit this time ! I now understand why adding envir = parent.frame() makes a difference even if it is do.call default argument. You deserve more than +1 for the help you provided ! NB: with(data = warpbreaks, expr = z()), with(warpbreaks, fun()) and with(warpbreaks, fun(y = parent.frame())) also demonstrate that the problem was the same with with.Steersman
No problem, it was fun to figure out! I tend to avoid environments wherever possible in my own work since I don't fully understand them. So it's good for me to take the time to dig into how they actually workAksum
I finally realize the wrapper fails with mapply although it works with lapply. Can you please have a look to my new question: https://mcmap.net/q/1153167/-non-standard-evaluation-of-subset-argument-with-mapply-in-r/11148823 ?Steersman
For the record, about why adding envir = parent.frame() makes a difference even if it is do.call default argument: "One of the most important things to know about the evaluation of arguments to a function is that supplied arguments and default arguments are treated differently. The supplied arguments to a function are evaluated in the evaluation frame of the calling function. The default arguments to a function are evaluated in the evaluation frame of the function." (from cran.r-project.org/doc/manuals/r-release/…)Steersman
C
0

As you only want to pass all the arguments passed to ftable u do not need the do.call().

mytable <- function(...) {
  tab <- ftable(...)
  prop <- prop.table(x = tab,
                     margin = 2) * 100
  bind <- cbind(as.matrix(x = tab),
                as.matrix(x = prop))
  margin <- addmargins(A = bind,
                       margin = 1)
  return(round(x = margin,
               digits = 1))
}

The following lapply creates a table for every Variable separatly i don't know if that is what you want.

lapply(X = c("breaks",
             "wool",
             "tension"),
       FUN = function(x) mytable(warpbreaks[x],
                                 row.vars = 1))

If you want all 3 variables in 1 table

warpbreaks$newVar <- LETTERS[3:4]

lapply(X = cbind("c(\"breaks\", \"wool\", \"tension\")",
             "c(\"newVar\", \"tension\",\"wool\")"),
       FUN = function(X)
        eval(parse(text=paste("mytable(warpbreaks[,",X,"],
                                 row.vars = 1)")))
)
Claritaclarity answered 24/4, 2019 at 7:28 Comment(1)
Thank you for your answer. However, as explained in my question, I need do.call to use the subset argument of ftable method for class "formula" because it uses non-standard evaluation (more details on my previous question).Steersman
S
0

Thanks to this issue, the wrapper became:

# function 1
mytable <- function(...) {
    do.call(what = ftable,
            args = as.list(x = match.call()[-1]),
            envir = parent.frame())
    # etc
}

Or:

# function 2
mytable <- function(...) {
    mc <- match.call()
    mc[[1]] <- quote(expr = ftable)
    eval.parent(expr = mc)
    # etc
}

I can now use the subset argument of ftable, and use the wrapper in lapply:

lapply(X = warpbreaks[c("wool",
                        "tension")],
       FUN = function(x) mytable(formula = x ~ breaks,
                                 data = warpbreaks,
                                 subset = breaks < 15))

However I do not understand why I have to supply envir = parent.frame() to do.call as it is a default argument.

More importantly, these methods do not resolve another issue: I can not use the subset argument of ftable with mapply.

Steersman answered 24/4, 2019 at 11:6 Comment(1)
I've posted an answer that hopefully explains what's happening here in more depth, though you basically figured it out already yourself. For your "bonus questions", you should ask new questions for them – the StackOverflow format is made for 1 question per questionAksum

© 2022 - 2024 — McMap. All rights reserved.