How to feed a list of unquoted column names into `lapply` (so that I can use it with a `dplyr` function)

Asked 16/11, 2017 at 0:14 Answered 16/11, 2017 at 3:4

I am trying to write a function in tidyverse/dplyr that I want to eventually use with lapply (or map). (I had been working on it to answer this question, but came upon an interesting result/dead-end. Please don't mark this as a duplicate - this question is an extension/departure from the answers that you see there.)

Is there
1) a way to get a list of quoted variables to work inside a dplyr function
(and not use the deprecated SE_ functions) or is there
2) some way to feed a list of unquoted strings through an lapply or map

I have used the Programming in Dplyr vignette to construct what I believe is a function most in line with the current standard for working with the NSE.

The sample data:

sample_data <- 
    read.table(text = "REVENUEID AMOUNT  YEAR REPORT_CODE PAYMENT_METHOD INBOUND_CHANNEL  AMOUNT_CAT
               1 rev-24985629     30  FY18           S          Check            Mail     25,50
               2 rev-22812413      1  FY16           Q          Other      Canvassing   0.01,10
               3 rev-23508794    100  FY17           Q    Credit_card             Web   100,250
               4 rev-23506121    300  FY17           S    Credit_card            Mail   250,500
               5 rev-23550444    100  FY17           S    Credit_card             Web   100,250
               6 rev-21508672     25  FY14           J          Check            Mail     25,50
               7 rev-24981769    500  FY18           S    Credit_card             Web 500,1e+03
               8 rev-23503684     50  FY17           R          Check            Mail     50,75
               9 rev-24982087     25  FY18           R          Check            Mail     25,50
               10 rev-24979834     50  FY18           R    Credit_card             Web    50,75
                      ", header = TRUE, stringsAsFactors = FALSE)

A report generating function

report <- function(report_cat){
    report_cat <- enquo(report_cat)
    sample_data %>%
    group_by(!!report_cat, YEAR) %>%
    summarize(num=n(),total=sum(AMOUNT)) %>% 
    rename(REPORT_VALUE = !!report_cat) %>% 
    mutate(REPORT_CATEGORY := as.character(quote(!!report_cat))[2])
}

Which works fine for generating a single report:

> report(REPORT_CODE)
# A tibble: 7 x 5
# Groups:   REPORT_VALUE [4]
  REPORT_VALUE  YEAR   num total REPORT_CATEGORY
         <chr> <chr> <int> <int>           <chr>
1            J  FY14     1    25     REPORT_CODE
2            Q  FY16     1     1     REPORT_CODE
3            Q  FY17     1   100     REPORT_CODE
4            R  FY17     1    50     REPORT_CODE
5            R  FY18     2    75     REPORT_CODE
6            S  FY17     2   400     REPORT_CODE
7            S  FY18     2   530     REPORT_CODE

It is when I try and set up a list of all 4 of the reports to generate, that everything breaks down. (Though admittedly the code required in that last line of the function - to return a string with which to then fill the column - should be clue enough that I have wandered off in the wrong direction.)

#the other reports
cat.list <- c("REPORT_CODE","PAYMENT_METHOD","INBOUND_CHANNEL","AMOUNT_CAT")

# Applying and Mapping attempts 
lapply(cat.list, report)
map_df(cat.list, report)

Which results in:

> lapply(cat.list, report)  
 Error in (function (x, strict = TRUE)  : 
  the argument has already been evaluated  

> map_df(cat.list, report)
 Error in (function (x, strict = TRUE)  : 
  the argument has already been evaluated

I have also tried to convert the list of strings to names before handing it over to apply and map:

library(rlang)
cat.names <- lapply(cat.list, sym)
lapply(cat.names, report)
map_df(cat.names, report)

> lapply(cat.names, report)
 Error in (function (x, strict = TRUE)  : 
  the argument has already been evaluated 
> map_df(cat.names, report)
 Error in (function (x, strict = TRUE)  : 
  the argument has already been evaluated

In any case, the reason I am asking this question is that I think that I have written the function to the currently documented standards, but ultimately I can then see no way to utilize a member of the apply or even of the purrr::map family with such a function. Short of rewriting the function to use names like useR has done here https://mcmap.net/q/1166646/-r-help-function-on-multiple-data-frame-columns is there a way to get this function to work with apply or map?

I am hoping to see this as a result:

# A tibble: 27 x 5
# Groups:   REPORT_VALUE [16]
   REPORT_VALUE  YEAR   num total REPORT_CATEGORY
          <chr> <chr> <int> <int>           <chr>
 1            J  FY14     1    25     REPORT_CODE
 2            Q  FY16     1     1     REPORT_CODE
 3            Q  FY17     1   100     REPORT_CODE
 4            R  FY17     1    50     REPORT_CODE
 5            R  FY18     2    75     REPORT_CODE
 6            S  FY17     2   400     REPORT_CODE
 7            S  FY18     2   530     REPORT_CODE
 8        Check  FY14     1    25  PAYMENT_METHOD
 9        Check  FY17     1    50  PAYMENT_METHOD
10        Check  FY18     2    55  PAYMENT_METHOD
# ... with 17 more rows

Shiloh answered 16/11, 2017 at 0:14 Comment(1)

Great follow-up question. See my answer for explanation of syms and quos – Crewel 16/11, 2017 at 3:5

as.name will convert a string to a name and that can be passed to report:

lapply(cat.list, function(x) do.call("report", list(as.name(x))))

character argument An alternative is to rewrite report so that it accepts a character string argument:

report_ch <- function(colname) {  
    report_cat <- rlang::sym(colname)   # as.name(colname) would also work here
    sample_data %>%
                group_by(!!report_cat, YEAR) %>%
                summarize(num = n(), total = sum(AMOUNT)) %>% 
                rename(REPORT_VALUE = !!report_cat) %>% 
                mutate(REPORT_CATEGORY = colname)
}

lapply(cat.list, report_ch)

wrapr An alternate approach is to rewrite report using the wrapr package which is an alternative to rlang/tidyeval:

library(dplyr)
library(wrapr)

report_wrapr <- function(colname) 
  let(c(COLNAME = colname),
      sample_data %>%
                  group_by(COLNAME, YEAR) %>%
                  summarize(num = n(), total = sum(AMOUNT)) %>%
                  rename(REPORT_VALUE = COLNAME) %>%
                  mutate(REPORT_CATEGORY = colname)
   )

lapply(cat.list, report_wrapr)

Of course, this whole problem would go away if you used a different framework, e.g.

plyr

library(plyr)

report_plyr <- function(colname)
  ddply(sample_data, c(REPORT_VALUE = colname, "YEAR"), function(x)
     data.frame(num = nrow(x), total = sum(x$AMOUNT), REPORT_CATEOGRY = colname))

lapply(cat.list, report_plyr)

sqldf

library(sqldf)

report_sql <- function(colname, envir = parent.frame(), ...)
  fn$sqldf("select [$colname] REPORT_VALUE,
                   YEAR,
                   count(*) num,
                   sum(AMOUNT) total,
                   '$colname' REPORT_CATEGORY
            from sample_data
            group by [$colname], YEAR", envir = envir, ...)

lapply(cat.list, report_sql)

base - by

report_base_by <- function(colname)
      do.call("rbind", 
        by(sample_data, sample_data[c(colname, "YEAR")], function(x)
            data.frame(REPORT_VALUE = x[1, colname], 
                       YEAR = x$YEAR[1], 
                       num = nrow(x), 
                       total = sum(x$AMOUNT), 
                       REPORT_CATEGORY = colname)
         )
      )

lapply(cat.list, report_base_by)

data.table The data.table package provides another alternative but that has already been covered by another answer.

Update: Added additional alternatives.

Salisbarry answered 16/11, 2017 at 1:10 Comment(1)

Such a thorough and well crafted collection of solutions. I will continue to unpack these for days - there is so much to learn here. Thank you! – Shiloh 16/11, 2017 at 3:51

Let me first point out that in your initial report function, you can use quo_name to convert the quosure into a string, which you can then use in mutate like the following:

library(dplyr)
library(rlang)

report <- function(report_cat){
  report_cat <- enquo(report_cat)

  sample_data %>%
    group_by(!!report_cat, YEAR) %>%
    summarize(num=n(),total=sum(AMOUNT)) %>%
    rename(REPORT_VALUE = !!report_cat) %>%
    mutate(REPORT_CATEGORY = quo_name(report_cat))
}

report(REPORT_CODE)

Now, to address your question of "how to feed a list of unquoted strings through lapply or map to make it work inside dplyr functions", I propose two ways of doing it.

1. Use `rlang::sym` to parse your strings and unquote it when feeding into `lapply` or `map`

library(purrr)

cat.list <- c("REPORT_CODE","PAYMENT_METHOD","INBOUND_CHANNEL","AMOUNT_CAT")

map_df(cat.list, ~report(!!sym(.)))

or with syms you can parse all elements of a vector at once:

map_df(syms(cat.list), ~report(!!.))

Result:

# A tibble: 27 x 5
# Groups:   REPORT_VALUE [16]
   REPORT_VALUE  YEAR   num total REPORT_CATEGORY
          <chr> <chr> <int> <int>           <chr>
 1            J  FY14     1    25     REPORT_CODE
 2            Q  FY16     1     1     REPORT_CODE
 3            Q  FY17     1   100     REPORT_CODE
 4            R  FY17     1    50     REPORT_CODE
 5            R  FY18     2    75     REPORT_CODE
 6            S  FY17     2   400     REPORT_CODE
 7            S  FY18     2   530     REPORT_CODE
 8        Check  FY14     1    25  PAYMENT_METHOD
 9        Check  FY17     1    50  PAYMENT_METHOD
10        Check  FY18     2    55  PAYMENT_METHOD
# ... with 17 more rows

2. Rewrite your `report` function by placing `lapply` or `map` inside so that `report` can do NSE

report <- function(...){
  report_cat <- quos(...)

  map_df(report_cat, function(x) sample_data %>%
             group_by(!!x, YEAR) %>%
             summarize(num=n(),total=sum(AMOUNT)) %>%
             rename(REPORT_VALUE = !!x) %>%
             mutate(REPORT_CATEGORY = quo_name(x)))
}

By placing map_df inside report, you can take advantage of quos, which converts ... to list of quosures. They are then fed into map_df and unquoted one by one using !!.

report(REPORT_CODE, PAYMENT_METHOD, INBOUND_CHANNEL, AMOUNT_CAT)

Another advantage of writing it like this is that you can also supply a vector of string symbols and splice them using !!! like the following:

report(!!!syms(cat.list))

Result:

# A tibble: 27 x 5
# Groups:   REPORT_VALUE [16]
   REPORT_VALUE  YEAR   num total REPORT_CATEGORY
          <chr> <chr> <int> <int>           <chr>
 1            J  FY14     1    25     REPORT_CODE
 2            Q  FY16     1     1     REPORT_CODE
 3            Q  FY17     1   100     REPORT_CODE
 4            R  FY17     1    50     REPORT_CODE
 5            R  FY18     2    75     REPORT_CODE
 6            S  FY17     2   400     REPORT_CODE
 7            S  FY18     2   530     REPORT_CODE
 8        Check  FY14     1    25  PAYMENT_METHOD
 9        Check  FY17     1    50  PAYMENT_METHOD
10        Check  FY18     2    55  PAYMENT_METHOD
# ... with 17 more rows

Crewel answered 16/11, 2017 at 3:4 Comment(2)

Wow, I think I have already learned 7 new things and I am only about half way through all of your solutions. So many interesting layers to consider. I think I am starting to recognize why functional programming is spoken of in such reverential tones. Thank you! – Shiloh 16/11, 2017 at 4:3

@JensLeerssen Glad that it helped. You always learn something everyday :) – Crewel 16/11, 2017 at 6:22

I'm not really a dplyr afficionado, but for what its worth here is how you could achieve this using library(data.table) instead:

setDT(sample_data)

gen_report <- function(report_cat){
  sample_data[ , .(num = .N, total = sum(AMOUNT), REPORT_CATEGORY = report_cat), 
               by = .(REPORT_VALUE = get(report_cat), YEAR)] 
}

gen_report('REPORT_CODE')
lapply(cat.list, gen_report)

Porringer answered 16/11, 2017 at 1:8 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

The sample data:

A report generating function

1. Use rlang::sym to parse your strings and unquote it when feeding into lapply or map

2. Rewrite your report function by placing lapply or map inside so that report can do NSE

Recommended topics

Hot tags

1. Use `rlang::sym` to parse your strings and unquote it when feeding into `lapply` or `map`

2. Rewrite your `report` function by placing `lapply` or `map` inside so that `report` can do NSE