Note: When referring to documentation and source code, I provide links to an unofficial GitHub mirror of R's official Subversion repository. The links are bound to commit 97b6424 in the GitHub repo, which maps to revision 81461
in the Subversion repo (the latest at the time of this edit).
substitute
is a "special" whose arguments are not evaluated (doc).
typeof(substitute)
[1] "special"
That means that the return value of substitute
may not agree with parser logic, depending on how the unevaluated arguments are processed internally.
In general, substitute
receives the call ...(<exprs>)
as a LANGSXP
of the form (pseudocode) pairlist(R_DotsSymbol, <exprs>)
(doc). The context of the substitute
call determines how the SYMSXP
R_DotsSymbol
is processed. Specifically, if substitute
was called inside of a function with ...
as a formal argument and rho
as its execution environment, then the result of
findVarInFrame3(rho, R_DotsSymbol, TRUE)
in the body of C utility substituteList
(source) is either a DOTSXP
or R_MissingArg
—the latter if and only if f
was called without arguments (doc). In other contexts, the result is R_UnboundValue
or (exceptionally) some other SEXP
—the latter if and only if a value is bound to the name ...
in rho
. Each of these cases is handled specially by substituteList
.
The multiplicity in the processing of R_DotsSymbol
is the reason why these R statements give different results:
f0 <- function() substitute(...(n = 1)); f0()
## ...(n = 1)
f1 <- function(...) substitute(...(n = 1)); f1()
## $n
## [1] 1
g0 <- function() {... <- quote(x); substitute(...(n = 1))}; g0()
## Error in g0() : '...' used in an incorrect context
g1 <- function(...) {... <- quote(x); substitute(...(n = 1))}; g1()
## Error in g1() : '...' used in an incorrect context
h0 <- function() {... <- NULL; substitute(...(n = 1))}; h0()
## $n
## [1] 1
h1 <- function(...) {... <- NULL; substitute(...(n = 1))}; h1()
## $n
## [1] 1
Given how ...(n = 1)
is parsed, you might have expected f1
to return call("...", n = 1)
, both g0
and g1
to return call("x", n = 1)
, and both h0
and h1
to throw an error, but that is not the case for the above, mostly undocumented reasons.
Internals
When called inside of the R function f
,
f <- function(...) substitute(...(<exprs>))
substitute
evaluates a call to the C utility do_substitute
—you can learn this by looking here—in which argList
gets a LISTSXP
of the form pairlist(x, R_MissingArg)
, where x
is a LANGSXP
of the form pairlist(R_DotsSymbol, <exprs>)
(source).
If you follow the body of do_substitute
, then you will find that the value of t
passed to substituteList
from do_substitute
is a LISTSXP
of the form pairlist(copy_of_x)
(source).
It follows that the while
loop inside of the substituteList
call (source) has exactly one iteration and that the statement CAR(el) == R_DotsSymbol
in the body of the loop (source) is false
in that iteration.
In the false
branch of the conditional (source), h
gets the value
pairlist(substituteList(copy_of_x, env))
. The loop exits and substituteList
returns h
to do_substitute
, which in turn returns CAR(h)
to R (source 1, 2, 3).
Hence the return value of substitute
is substituteList(copy_of_x, env)
, and it remains to deduce the identity of this SEXP
. Inside of this call to substituteList
, the while
loop has 1+m
iterations, where m
is the number of <exprs>
. In the first iteration, the statement CAR(el) == R_DotsSymbol
in the body of the loop is true
.
In the true
branch of the conditional (source), h
is either a DOTSXP
or R_MissingArg
, because f
has ...
as a formal argument (doc). Continuing, you will find that substituteList
returns:
R_NilValue
if h
was R_MissingArg
in the first while
iteration and m = 0
,
or, otherwise,
- a
LISTSXP
listing the expressions in h
(if h
was a DOTSXP
in the first while
iteration) followed by <exprs>
(if m > 1
), all unevaluated and without substitutions, because the execution environment of f
is empty at the time of the substitute
call.
Indeed:
f <- function(...) substitute(...())
is.null(f())
## [1] TRUE
f <- function(...) substitute(...(n = 1))
identical(f(a = sin(x), b = zzz), pairlist(a = quote(sin(x)), b = quote(zzz), n = 1))
## [1] TRUE
Misc
FWIW, it helped me to recompile R after adding some print statements to coerce.c
. For example, I added the following before UNPROTECT(3);
in the body of do_substitute
(source):
Rprintf("CAR(t) == R_DotsSymbol? %d\n",
CAR(t) == R_DotsSymbol);
if (TYPEOF(CAR(t)) == LISTSXP || TYPEOF(CAR(t)) == LANGSXP) {
Rprintf("TYPEOF(CAR(t)) = %s, length(CAR(t)) = %d\n",
type2char(TYPEOF(CAR(t))), length(CAR(t)));
Rprintf("CAR(CAR(t)) = R_DotsSymbol? %d\n",
CAR(CAR(t)) == R_DotsSymbol);
Rprintf("TYPEOF(CDR(CAR(t))) = %s, length(CDR(CAR(t))) = %d\n",
type2char(TYPEOF(CDR(CAR(t)))), length(CDR(CAR(t))));
}
if (TYPEOF(s) == LISTSXP || TYPEOF(s) == LANGSXP) {
Rprintf("TYPEOF(s) = %s, length(s) = %d\n",
type2char(TYPEOF(s)), length(s));
Rprintf("TYPEOF(CAR(s)) = %s, length(CAR(s)) = %d\n",
type2char(TYPEOF(CAR(s))), length(CAR(s)));
}
which helped me confirm what was going into and coming out of the substituteList
call on the previous line:
f <- function(...) substitute(...(n = 1))
invisible(f(hello, world, hello(world)))
CAR(t) == R_DotsSymbol? 0
TYPEOF(CAR(t)) = language, length(CAR(t)) = 2
CAR(CAR(t)) = R_DotsSymbol? 1
TYPEOF(CDR(CAR(t))) = pairlist, length(CDR(CAR(t))) = 1
TYPEOF(s) = pairlist, length(s) = 1
TYPEOF(CAR(s)) = pairlist, length(CAR(s)) = 4
invisible(substitute(...()))
CAR(t) == R_DotsSymbol? 0
TYPEOF(CAR(t)) = language, length(CAR(t)) = 1
CAR(CAR(t)) = R_DotsSymbol? 1
TYPEOF(CDR(CAR(t))) = NULL, length(CDR(CAR(t))) = 0
TYPEOF(s) = pairlist, length(s) = 1
TYPEOF(CAR(s)) = language, length(CAR(s)) = 1
Obviously, compiling R with debugging symbols and running R under a debugger helps, too.
Another puzzle
Just noticed this oddity:
g <- function(...) substitute(...(n = 1), new.env())
gab <- g(a = sin(x), b = zzz)
typeof(gab)
## [1] "language"
gab
## ...(n = 1)
Someone here can do another deep dive to find out why the result is a LANGSXP
rather than a LISTSXP
when you supply env
different from environment()
(including env = NULL
).
substitute
. Would definitely love to know the solution to this – Periquesubstitute(...())
. Didn't really get answered did it? – Tic(function(...) substitute(...()))(a,b)
runs fine in 1.4.1. It's been 20 years, and still not documented (?) – Pliable?substitute
does warn: "There is no guarantee that the resulting expression makes any sense." I'm guessing that that was intended as a catch-all for weird behaviour. – Aristides