ellipsis ... as function in substitute?
Asked Answered
P

1

27

I'm having trouble understanding how/why parentheses work where they otherwise should not work®.

f = function(...) substitute(...()); f(a, b)
[[1]]
a
[[2]]
b
# but, substitute returns ..1
f2 = function(...) substitute(...); f2(a, b)
a

Normally an error is thrown, could not find function "..." or '...' used in an incorrect context, for example when calling (\(...) ...())(5).

What I've tried
I have looked at the source code of substitute to find out why this doesn't happen here. R Internals 1.1.1 and 1.5.2 says ... is of SEXPTYPE DOTSXP, a pairlist of promises. These promises are what is extracted by substitute.

#  \-substitute #R
#    \-do_substitute #C
#      \-substituteList #C recursive
#        \-substitute #C

Going line-by-line, I am stuck at substituteList, in which h is the current element of ... being processed. This happens recursively at line 2832 if (TYPEOF(h) == DOTSXP) h = substituteList(h, R_NilValue);. I haven't found exception handling of a ...() case in the source code, so I suspect something before this has happened.

In ?substitute we find substitute works on a purely lexical basis. Does it mean ...() is a parser trick?

parse(text = "(\\(...) substitute(...()))(a, b)") |> getParseData() |> subset(text == "...", select = c(7, 9))

#>                   token  text
#> 4        SYMBOL_FORMALS   ...
#> 10 SYMBOL_FUNCTION_CALL   ...

The second ellipsis is recognized during lexical analysis as the name of a function call. It doesn't have its own token like |> does. The output is a pairlist ( typeof(f(a, b)) ), which in this case is the same as a regular list (?). I guess it is not a parser trick. But whatever it is, it has been around for a while!

old tricks

Question:
How does ...() work?

Pliable answered 6/1, 2022 at 5:39 Comment(5)
Great question. THis is the undocumented behaviour of substitute. Would definitely love to know the solution to thisPerique
A related question on R-devel mailing list: support of substitute(...()). Didn't really get answered did it?Tic
@Tic Yes, that question seems important. Is this part of the official API since it doesn't appear to be documented? Or is this documented (indirectly) and we just don't understand the documentation? Someone should follow up on R-devel.Coumas
@Tic We can somewhat answer one of those questions: added between versions 1.3.1 (Sept 2001) and 1.4.1 (Jan 2002). (function(...) substitute(...()))(a,b) runs fine in 1.4.1. It's been 20 years, and still not documented (?)Pliable
Well, ?substitute does warn: "There is no guarantee that the resulting expression makes any sense." I'm guessing that that was intended as a catch-all for weird behaviour.Aristides
A
22

Note: When referring to documentation and source code, I provide links to an unofficial GitHub mirror of R's official Subversion repository. The links are bound to commit 97b6424 in the GitHub repo, which maps to revision 81461 in the Subversion repo (the latest at the time of this edit).


substitute is a "special" whose arguments are not evaluated (doc).

typeof(substitute)
[1] "special"

That means that the return value of substitute may not agree with parser logic, depending on how the unevaluated arguments are processed internally.

In general, substitute receives the call ...(<exprs>) as a LANGSXP of the form (pseudocode) pairlist(R_DotsSymbol, <exprs>) (doc). The context of the substitute call determines how the SYMSXP R_DotsSymbol is processed. Specifically, if substitute was called inside of a function with ... as a formal argument and rho as its execution environment, then the result of

findVarInFrame3(rho, R_DotsSymbol, TRUE)

in the body of C utility substituteList (source) is either a DOTSXP or R_MissingArg—the latter if and only if f was called without arguments (doc). In other contexts, the result is R_UnboundValue or (exceptionally) some other SEXP—the latter if and only if a value is bound to the name ... in rho. Each of these cases is handled specially by substituteList.

The multiplicity in the processing of R_DotsSymbol is the reason why these R statements give different results:

f0 <- function() substitute(...(n = 1)); f0()
## ...(n = 1)
f1 <- function(...) substitute(...(n = 1)); f1()
## $n
## [1] 1
g0 <- function() {... <- quote(x); substitute(...(n = 1))}; g0()
## Error in g0() : '...' used in an incorrect context
g1 <- function(...) {... <- quote(x); substitute(...(n = 1))}; g1()
## Error in g1() : '...' used in an incorrect context
h0 <- function() {... <- NULL; substitute(...(n = 1))}; h0()
## $n
## [1] 1
h1 <- function(...) {... <- NULL; substitute(...(n = 1))}; h1()
## $n
## [1] 1

Given how ...(n = 1) is parsed, you might have expected f1 to return call("...", n = 1), both g0 and g1 to return call("x", n = 1), and both h0 and h1 to throw an error, but that is not the case for the above, mostly undocumented reasons.

Internals

When called inside of the R function f,

f <- function(...) substitute(...(<exprs>))

substitute evaluates a call to the C utility do_substitute—you can learn this by looking here—in which argList gets a LISTSXP of the form pairlist(x, R_MissingArg), where x is a LANGSXP of the form pairlist(R_DotsSymbol, <exprs>) (source).

If you follow the body of do_substitute, then you will find that the value of t passed to substituteList from do_substitute is a LISTSXP of the form pairlist(copy_of_x) (source).

It follows that the while loop inside of the substituteList call (source) has exactly one iteration and that the statement CAR(el) == R_DotsSymbol in the body of the loop (source) is false in that iteration.

In the false branch of the conditional (source), h gets the value pairlist(substituteList(copy_of_x, env)). The loop exits and substituteList returns h to do_substitute, which in turn returns CAR(h) to R (source 1, 2, 3).

Hence the return value of substitute is substituteList(copy_of_x, env), and it remains to deduce the identity of this SEXP. Inside of this call to substituteList, the while loop has 1+m iterations, where m is the number of <exprs>. In the first iteration, the statement CAR(el) == R_DotsSymbol in the body of the loop is true.

In the true branch of the conditional (source), h is either a DOTSXP or R_MissingArg, because f has ... as a formal argument (doc). Continuing, you will find that substituteList returns:

  • R_NilValue if h was R_MissingArg in the first while iteration and m = 0,

or, otherwise,

  • a LISTSXP listing the expressions in h (if h was a DOTSXP in the first while iteration) followed by <exprs> (if m > 1), all unevaluated and without substitutions, because the execution environment of f is empty at the time of the substitute call.

Indeed:

f <- function(...) substitute(...())
is.null(f())
## [1] TRUE
f <- function(...) substitute(...(n = 1))
identical(f(a = sin(x), b = zzz), pairlist(a = quote(sin(x)), b = quote(zzz), n = 1))
## [1] TRUE

Misc

FWIW, it helped me to recompile R after adding some print statements to coerce.c. For example, I added the following before UNPROTECT(3); in the body of do_substitute (source):

    Rprintf("CAR(t) == R_DotsSymbol? %d\n",
            CAR(t) == R_DotsSymbol);
    if (TYPEOF(CAR(t)) == LISTSXP || TYPEOF(CAR(t)) == LANGSXP) {
        Rprintf("TYPEOF(CAR(t)) = %s, length(CAR(t)) = %d\n",
                type2char(TYPEOF(CAR(t))), length(CAR(t)));
        Rprintf("CAR(CAR(t)) = R_DotsSymbol? %d\n",
                CAR(CAR(t)) == R_DotsSymbol);
        Rprintf("TYPEOF(CDR(CAR(t))) = %s, length(CDR(CAR(t))) = %d\n",
                type2char(TYPEOF(CDR(CAR(t)))), length(CDR(CAR(t))));
    }
    if (TYPEOF(s) == LISTSXP || TYPEOF(s) == LANGSXP) {
        Rprintf("TYPEOF(s) = %s, length(s) = %d\n",
                type2char(TYPEOF(s)), length(s));
        Rprintf("TYPEOF(CAR(s)) = %s, length(CAR(s)) = %d\n",
                type2char(TYPEOF(CAR(s))), length(CAR(s)));
    }

which helped me confirm what was going into and coming out of the substituteList call on the previous line:

f <- function(...) substitute(...(n = 1))
invisible(f(hello, world, hello(world)))
CAR(t) == R_DotsSymbol? 0
TYPEOF(CAR(t)) = language, length(CAR(t)) = 2
CAR(CAR(t)) = R_DotsSymbol? 1
TYPEOF(CDR(CAR(t))) = pairlist, length(CDR(CAR(t))) = 1
TYPEOF(s) = pairlist, length(s) = 1
TYPEOF(CAR(s)) = pairlist, length(CAR(s)) = 4
invisible(substitute(...()))
CAR(t) == R_DotsSymbol? 0
TYPEOF(CAR(t)) = language, length(CAR(t)) = 1
CAR(CAR(t)) = R_DotsSymbol? 1
TYPEOF(CDR(CAR(t))) = NULL, length(CDR(CAR(t))) = 0
TYPEOF(s) = pairlist, length(s) = 1
TYPEOF(CAR(s)) = language, length(CAR(s)) = 1

Obviously, compiling R with debugging symbols and running R under a debugger helps, too.

Another puzzle

Just noticed this oddity:

g <- function(...) substitute(...(n = 1), new.env())
gab <- g(a = sin(x), b = zzz)
typeof(gab)
## [1] "language"
gab
## ...(n = 1)

Someone here can do another deep dive to find out why the result is a LANGSXP rather than a LISTSXP when you supply env different from environment() (including env = NULL).

Aristides answered 7/1, 2022 at 9:57 Comment(2)
First, really neat approach! How to recompile R for troubleshooting deserves its own topic. In the case of (\(...) substitute(...))(a, b) only a is returned. From your answer I understand this is because ...() is pairlist(R_DotsSymbol, <exprs>). But, I'm not entirely clear on why just ... without parentheses does not follow the same path of 1+m iterations giving the same result as ...().Pliable
In that case, argList gets the value pairlist(R_DotsSymbol, R_MissingArg) inside of do_substitute instead of pairlist(pairlist(R_DotsSymbol, <exprs>), R_MissingArg). (Indeed, ... by itself is a SYMSXP, whereas ...() is a LANGSXP, so do_substitute is behaving consistently here.) It follows that substituteList returns to do_substitute the LISTSXP pairlist(a, b), and that do_substitute returns to R the SYMSXP a.Aristides

© 2022 - 2024 — McMap. All rights reserved.