Is there a multiple-columns-as-input version of dplyr's "across" function?

Asked 5/9, 2024 at 13:6 Answered 5/9, 2024 at 17:43

I had to write a function today like this

data1 %>%
  summarise(
   ab1 = fn(a1, b1),
   ab2 = fn(a2, b2), 
   ab3 = fn(a3, b3) 
  )
# imagine if there are 100 of them

If fn was a single argument function I could've done

data1 %>%
  summarise(across(starts_with("a", fn)))

But unfortunately, my function needs two columns as inputs. Is there a way to do this without writing a new line for every set of arguments?

Beet answered 5/9, 2024 at 13:6 Comment(0)

You may use map2* functions to pass two set of columns.

library(dplyr)
library(purrr)

data1 %>%
  summarise(map2_df(pick(starts_with("a")), pick(starts_with("b")), fn))

#  a1 a2 a3
#1 21 57 93

Using data from @ThomasIsCoding but a different function since your code uses summarise it means it will have a single row at the end.

fn <- function(a, b) {
  sum(a, b)
}

Subaxillary answered 5/9, 2024 at 15:17 Comment(3)

ok, I think I can generalise this to a 3 argument with pmap – Beet 5/9, 2024 at 22:4

map2_df is deprecated. Just use map2 followed by unlist – Beet 5/9, 2024 at 22:16

and I have to assume the columns are in the right order? can i arrange – Beet 5/9, 2024 at 22:26

Another approach using reshaped data. If you can get over the hurdle of reshaping back and forth from longer form, the calculation would be trivial.

One benefit of this approach is that it is robust to column order, and you don't need to specify the column prefixes upfront, provided there is some regular pattern you can specify with regex.

library(tidyverse)
data1 |>

  # reshape long, in this case assuming the columns are all (letters)(numbers).
  mutate(row = row_number()) |>
  pivot_longer(cols = -row,
               names_to = c(".value", "Pair"), 
               names_pattern = "(\\D+)(\\d+)") |>

  # do the calculation with the two or more involved columns
  mutate(ab = a*b, .by = c(row, Pair)) |>

  # reshape wider again
  pivot_wider(names_from = Pair, names_glue = "{.value}{Pair}", names_vary = "slowest",
              values_from = a:ab)

Output using data from @ThomasIsCoding:

    row    a1    b1   ab1    a2    b2   ab2    a3    b3   ab3
  <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1     4     4     7    10    70    13    16   208
2     2     2     5    10     8    11    88    14    17   238
3     3     3     6    18     9    12   108    15    18   270

Parliamentarian answered 5/9, 2024 at 17:43 Comment(0)

Probably you can try split.default to split columns into groups by their names, e.g.,

data1 %>%
  split.default(sub("\\D+", "ab", names(.))) %>%
  map_dfr(\(...) do.call(fn, unname(...)))

which gives

# A tibble: 3 × 3
    ab1   ab2   ab3
  <dbl> <dbl> <dbl>
1     4    70   208
2    10    88   238
3    18   108   270

data example

data1 <- data.frame(
  a1 = c(1, 2, 3),
  b1 = c(4, 5, 6),
  a2 = c(7, 8, 9),
  b2 = c(10, 11, 12),
  a3 = c(13, 14, 15),
  b3 = c(16, 17, 18)
)

fn <- function(a, b) {
  a * b
}

Labanna answered 5/9, 2024 at 13:38 Comment(0)

data example

Recommended topics

Hot tags