To add some nuance, these things are not necessarily that complex in base R.
It is important to remember to use eval.parent()
when relevant to evaluate substituted arguments in the right environment, if you use eval.parent()
properly the expression in nested calls will find their ways. If you don't you might discover environment hell :).
The base tool box that I use is made of quote()
, substitute()
, bquote()
, as.call()
, and do.call()
(the latter useful when used with substitute()
Without going into details here is how to solve in base R the cases presented by @Artem and @Tung, without any tidy evaluation, and then the last example, not using quo
/ enquo
, but still benefiting from splicing and unquoting (!!!
and !!
)
We'll see that splicing and unquoting makes code nicer (but requires functions to support it!), and that in the present cases using quosures doesn't improve things dramatically (but still arguably does).
solving Artem's case with base R
f0 <- function( myExpr ) {
eval(substitute(myExpr), list(a=2, b=3))
}
g0 <- function( myExpr ) {
val <- eval.parent(substitute(f0(myExpr)))
val
}
f0(a+b)
#> [1] 5
g0(a+b)
#> [1] 5
solving Tung's 1st case with base R
my_summarise0 <- function(df, group_var, select_var) {
group_var <- substitute(group_var)
select_var <- substitute(select_var)
# create new name
mean_name <- paste0("mean_", as.character(select_var))
eval.parent(substitute(
df %>%
select(select_var, group_var) %>%
group_by(group_var) %>%
summarise(mean_name := mean(select_var))))
}
library(dplyr)
set.seed(1234)
d = data.frame(x = c(1, 1, 2, 2, 3),
y = rnorm(5),
z = runif(5))
my_summarise0(d, x, z)
#> # A tibble: 3 x 2
#> x mean_z
#> <dbl> <dbl>
#> 1 1 0.619
#> 2 2 0.603
#> 3 3 0.292
solving Tung's 2nd case with base R
grouping_vars <- c(quote(x), quote(y))
eval(as.call(c(quote(group_by), quote(d), grouping_vars))) %>%
summarise(mean_z = mean(z))
#> # A tibble: 5 x 3
#> # Groups: x [3]
#> x y mean_z
#> <dbl> <dbl> <dbl>
#> 1 1 -1.21 0.694
#> 2 1 0.277 0.545
#> 3 2 -2.35 0.923
#> 4 2 1.08 0.283
#> 5 3 0.429 0.292
in a function:
my_summarise02 <- function(df, select_var, ...) {
group_var <- eval(substitute(alist(...)))
select_var <- substitute(select_var)
# create new name
mean_name <- paste0("mean_", as.character(select_var))
df %>%
{eval(as.call(c(quote(select),quote(.), select_var, group_var)))} %>%
{eval(as.call(c(quote(group_by),quote(.), group_var)))} %>%
{eval(bquote(summarise(.,.(mean_name) := mean(.(select_var)))))}
}
my_summarise02(d, z, x, y)
#> # A tibble: 5 x 3
#> # Groups: x [3]
#> x y mean_z
#> <dbl> <dbl> <dbl>
#> 1 1 -1.21 0.694
#> 2 1 0.277 0.545
#> 3 2 -2.35 0.923
#> 4 2 1.08 0.283
#> 5 3 0.429 0.292
solving Tung's 2nd case with base R but using !!
and !!!
grouping_vars <- c(quote(x), quote(y))
d %>%
group_by(!!!grouping_vars) %>%
summarise(mean_z = mean(z))
#> # A tibble: 5 x 3
#> # Groups: x [3]
#> x y mean_z
#> <dbl> <dbl> <dbl>
#> 1 1 -1.21 0.694
#> 2 1 0.277 0.545
#> 3 2 -2.35 0.923
#> 4 2 1.08 0.283
#> 5 3 0.429 0.292
in a function :
my_summarise03 <- function(df, select_var, ...) {
group_var <- eval(substitute(alist(...)))
select_var <- substitute(select_var)
# create new name
mean_name <- paste0("mean_", as.character(select_var))
df %>%
select(!!select_var, !!!group_var) %>%
group_by(!!!group_var) %>%
summarise(.,!!mean_name := mean(!!select_var))
}
my_summarise03(d, z, x, y)
#> # A tibble: 5 x 3
#> # Groups: x [3]
#> x y mean_z
#> <dbl> <dbl> <dbl>
#> 1 1 -1.21 0.694
#> 2 1 0.277 0.545
#> 3 2 -2.35 0.923
#> 4 2 1.08 0.283
#> 5 3 0.429 0.292
dplyr::
ppl would just allow us to pass variable names as character strings, as in the old underscored variants likemutate_()
. imo, an even better option would be to have an argument likecolnames_as_strings=TRUE
formutate()
et al... that would make it straightforward to use dplyr both interactively and in software. But until then, welcome toenquo()
/!!
hell... – Endosmosisenquo()
strategy really only makes sense if you are deeply committed to being able to pass column names without quotes (unclear to me why that's important but oh well). could be that there's some fundamental reason that requires understanding dplyr's internals to grasp... – Endosmosisbase::subset()
!) – Endosmosisgroup_by()
,select()
, andmutate_at()/summarize_at()
. When colnames aren't (or can't) be known in advance, it can be a pain to write good split-apply-combine functions in dplyr. Sometimes even feels easier to usebase::tapply()
, precisely because you can specify grouping cols as character strings that you pass as a parameter... In the specific case OP showed, it would of course be terrible if"m"
meantmydata$m
(or whenever a colname is used on the rhs of=
inside a dplyr table func). – Endosmosisdplyr::
and use it every day -- i just want it to be the best it can be!) – Endosmosisgroup_by(data, !! var)
. I honestly fail to see the difficulty. It’s a simple, clean, consistent, yet powerful abstraction. It’s thus diametrically opposite to whattapply
etc offer. – Chief