Sum across multiple columns with dplyr

Asked 5/3, 2015 at 8:19 Answered 25/8, 2023 at 11:47

177

My question involves summing up values across multiple columns of a data frame and creating a new column corresponding to this summation using dplyr. The data entries in the columns are binary(0,1). I am thinking of a row-wise analog of the summarise_each or mutate_each function of dplyr. Below is a minimal example of the data frame:

library(dplyr)
df=data.frame(
  x1=c(1,0,0,NA,0,1,1,NA,0,1),
  x2=c(1,1,NA,1,1,0,NA,NA,0,1),
  x3=c(0,1,0,1,1,0,NA,NA,0,1),
  x4=c(1,0,NA,1,0,0,NA,0,0,1),
  x5=c(1,1,NA,1,1,1,NA,1,0,1))

> df
   x1 x2 x3 x4 x5
1   1  1  0  1  1
2   0  1  1  0  1
3   0 NA  0 NA NA
4  NA  1  1  1  1
5   0  1  1  0  1
6   1  0  0  0  1
7   1 NA NA NA NA
8  NA NA NA  0  1
9   0  0  0  0  0
10  1  1  1  1  1

I could use something like:

df <- df %>% mutate(sumrow= x1 + x2 + x3 + x4 + x5)

but this would involve writing out the names of each of the columns. I have like 50 columns. In addition, the column names change at different iterations of the loop in which I want to implement this operation so I would like to try avoid having to give any column names.

How can I do that most efficiently? Any assistance would be greatly appreciated.

Laryngitis answered 5/3, 2015 at 8:19 Comment(12)

Why dplyr? Why not just a simple df$sumrow <- rowSums(df, na.rm = TRUE) from base R? Or df$sumrow <- Reduce(`+`, df) if you want to replicate the exact thing you did with dplyr. – Lavalley 5/3, 2015 at 8:22

You can do both with dplyr too as in df %>% mutate(sumrow = Reduce(`+`, .)) or df %>% mutate(sumrow = rowSums(.)) – Lavalley 5/3, 2015 at 8:38

Yes, the base R rowSums function works well. However, trying the suggestions in dplyr; df %>% mutate(sumrow = rowSums(.)) and df %>% mutate(sumrow = Reduce(+, .)) generate errors. I had just thought of fitting the operation within a dplyr chain alongside other preceding and subsequent operations in the chain.@DavidArenburg – Laryngitis 5/3, 2015 at 9:10

Update to the latest dplyr version and it will work. – Lavalley 5/3, 2015 at 9:11

Related: https://mcmap.net/q/131384/-summarise-over-all-columns – Andi 5/3, 2015 at 9:56

Suggestions by David Arenburg worked after updating package dplyr @DavidArenburg – Laryngitis 5/3, 2015 at 16:1

@boern David Arenburgs comment was the best answer and most direct solution. Your answer would work but it involves an extra step of replacing NA values with zero which might not be suitable in some cases. – Laryngitis 29/9, 2016 at 11:29

@Laryngitis I disagree: Davids first comment didn't utilize dplyr like requested plus requires the exact same step of removing NAs by na.rm = TRUE. His second comment is dplyr but (since the NA treatment is missing) it leads to NA sums. – Aho 29/9, 2016 at 11:42

@Aho So your are basically saying my comment is invalid because I didn't add na.rm = TRUE while your answer with an extra replace step and then using my exact comment is somewhat much better? Pretty sneaky (at best) statement and smells like rep wh*ring to me. – Lavalley 14/6, 2018 at 13:0

If you're summing across many rows in dplyr, your data is probably not "tidy" and it might be better to reshape or to just use base. – Kex 15/6, 2018 at 11:16

@DavidArenburg Honestly I'm having a hard time to recap, but if you look at the edit history my first post was just summarise_each(funs(sum)), a comment to my answer pointed out that this was the right answer, so I improved it. I don't remeber if I even read your comment. Either way, I don't think this discussion is worth our time :) – Aho 14/9, 2018 at 11:47

@DavidArenburg I want to select multiple columns (some numeric, some factors) all binary (0,1) to summarize in rows as count and proportion. Is that possible using dplyr? – Galvani 28/3, 2020 at 23:30

231

dplyr >= 1.0.0 using across

sum up each row using rowSums (rowwise works for any aggreation, but is slower)

df %>%
   replace(is.na(.), 0) %>%
   mutate(sum = rowSums(across(where(is.numeric))))

sum down each column

df %>%
   summarise(across(everything(), ~ sum(., na.rm = TRUE)))

dplyr < 1.0.0

sum up each row

df %>%
   replace(is.na(.), 0) %>%
   mutate(sum = rowSums(.[1:5]))

sum down each column using superseeded summarise_all:

df %>%
   replace(is.na(.), 0) %>%
   summarise_all(funs(sum))

Aho answered 23/9, 2015 at 8:28 Comment(16)

summarise_each sums down along each column while what is required is sum along each row – Laryngitis 23/9, 2015 at 8:55

I'm trying to achieve the same, but my DF has a column which is a character, hence I cannot sum all the columns. I guess I should modify the (.[1:5]) part, but unfortunately I am not familiar with the syntax nor I don't know how to look for help on it. Tried with mutate(sum = rowSums(is.numeric(.))) but didn't work. – Larkins 11/1, 2017 at 12:7

You could try to also replace all non numeric values with 0 using replace(!is.numeric(.), 0) %>% ? – Aho 11/1, 2017 at 12:20

Thanks! The problem is that I want to keep those non numeric values, so I'm afraid that if I did understand your suggestion properly it wouldn't be of much use. – Larkins 11/1, 2017 at 18:0

I see. You might want to give df %>% replace(is.na(.), 0) %>% select_if(is.numeric) %>% summarise_each(funs(sum)) a shot? – Aho 12/1, 2017 at 8:19

It also works with a character vector with the names of the columns you want to use, that can be a shortcut or not, depending the situation – Herringbone 21/7, 2017 at 18:30

Use summarise_all instead of summarise_each as it has been deprecated. – Extrapolate 17/6, 2018 at 23:27

Syntax mutate(sum = rowSums(.[,-1])) may come in handy if you don't know how many columns you need to deal with. – Gunplay 17/4, 2019 at 17:10

Since this question and the answer really got popular - I think it worth update the answer using updated version of dplyr @Aho – Goldwin 30/1, 2021 at 1:59

Note that if you use rowwise/ rowSums on a grouped data frame with many many groups, it's worth to ungroup first - makes a huge difference in performance github.com/tidyverse/dplyr/issues/5819 – Yevette 21/3, 2021 at 13:30

following a comment below, I would add na.rm = T to rowSums instead of converting the NA to 0. – Quit 7/7, 2021 at 17:40

replace(is.na(.), 0) I think is the same as replace_na(0) – Guff 25/4, 2023 at 0:47

@Guff No, df %>% replace_na(0) raises an error: "Error in replace_na(): ! `replace` must be a list, not a number." – Valenti 25/4, 2023 at 10:54

@Aho your original code df %>% summarise(across(everything(), ~ sum(., is.na(.), 0))) is a wrong use of sum(). I have revised it. You could check. – Valenti 25/4, 2023 at 11:6

@DarrenTsai my bad. I use replace_na for individual columns. – Guff 25/4, 2023 at 14:57

Starting from dplyr 1.1, you should replace fun(across(something)) with fun(pick(something)), see tidyverse.org/blog/2023/02/dplyr-1-1-0-pick-reframe-arrange – Lucilucia 29/8, 2023 at 11:27

dplyr >= 1.0.0

In newer versions of dplyr you can use rowwise() along with c_across to perform row-wise aggregation for functions that do not have specific row-wise variants, but if the row-wise variant exists it should be faster than using rowwise (eg rowSums, rowMeans).

Since rowwise() is just a special form of grouping and changes the way verbs work you'll likely want to pipe it to ungroup() after doing your row-wise operation.

To select a range by name:

df %>%
  rowwise() %>% 
  mutate(sumrange = sum(c_across(x1:x5), na.rm = T))
# %>% ungroup() # you'll likely want to ungroup after using rowwise()

To select by type:

df %>%
  rowwise() %>% 
  mutate(sumnumeric = sum(c_across(where(is.numeric)), na.rm = T))
# %>% ungroup() # you'll likely want to ungroup after using rowwise()

To select by column name:

You can use any number of tidy selection helpers like starts_with, ends_with, contains, etc.

df %>%
    rowwise() %>% 
    mutate(sum_startswithx = sum(c_across(starts_with("x")), na.rm = T))
# %>% ungroup() # you'll likely want to ungroup after using rowwise()

To select by column index:

df %>% 
  rowwise() %>% 
  mutate(sumindex = sum(c_across(c(1:4, 5)), na.rm = T))
# %>% ungroup() # you'll likely want to ungroup after using rowwise()

rowise() will work for any summary function. However, in your specific case a row-wise variant exists (rowSums) so you can do the following, which will be faster:

df %>%
  mutate(sumrow = rowSums(pick(x1:x5), na.rm = T))

Benchmarking

rowwise makes a pipe chain very readable and works fine for smaller data frames. However, it is inefficient.

rowwise versus row-wise variant function

For this example, the the row-wise variant rowSums is much faster:

library(microbenchmark)

set.seed(1)
large_df <- slice_sample(df, n = 1E5, replace = T) # 100,000 obs

microbenchmark(
  large_df %>%
    rowwise() %>% 
    mutate(sumrange = sum(c_across(x1:x5), na.rm = T)),
  large_df %>%
    mutate(sumrow = rowSums(pick(x1:x5), na.rm = T)),
  times = 10L
)

Unit: milliseconds
         min           lq         mean       median           uq          max neval cld
 11108.459801 11464.276501 12144.871171 12295.362251 12690.913301 12918.106801    10   b
     6.533301     6.649901     7.633951     7.808201     8.296101     8.693101    10  a

Large data frame without a row-wise variant function

If there isn't a row-wise variant for your function and you have a large data frame, consider a long-format, which is more efficient than rowwise. Though there are probably faster non-tidyverse options, here is a tidyverse option (using tidyr::pivot_longer):

library(tidyr)

tidyr_pivot <- function(){
  large_df %>% 
    mutate(rn = row_number()) %>% 
    pivot_longer(cols = starts_with("x")) %>% 
    group_by(rn) %>% 
    summarize(std = sd(value, na.rm = T), .groups = "drop") %>% 
    bind_cols(large_df, .) %>% 
    select(-rn)
}

dplyr_rowwise <- function(){
  large_df %>% 
    rowwise() %>% 
    mutate(std = sd(c_across(starts_with("x")), na.rm = T)) %>% 
    ungroup()
}

microbenchmark(dplyr_rowwise(),
               tidyr_pivot(),
               times = 10L)

Unit: seconds
            expr       min       lq      mean   median        uq       max neval cld
 dplyr_rowwise() 12.845572 13.48340 14.182836 14.30476 15.155155 15.409750    10   b
   tidyr_pivot()  1.404393  1.56015  1.652546  1.62367  1.757428  1.981293    10  a

c_across versus pick

In the particular case of the sum function, pick and c_across give the same output for much of the code above:

sum_pick <- df %>%
    rowwise() %>% 
    mutate(sumrange = sum(pick(x1:x5), na.rm = T))

sum_c_across <- df %>%
    rowwise() %>% 
    mutate(sumrange = sum(c_across(x1:x5), na.rm = T))

all.equal(sum_pick, sum_c_across)
[1] TRUE

The row-wise output of c_across is a vector (hence the c_), while the row-wise output of pick is a 1-row tibble object:

df %>% 
  rowwise() %>% 
  mutate(c_across = list(c_across(x1:x5)),
         pick = list(pick(x1:x5)),
         .keep = "unused") %>% 
  ungroup() 

# A tibble: 10 × 2
   c_across  pick            
   <list>    <list>          
 1 <dbl [5]> <tibble [1 × 5]>
 2 <dbl [5]> <tibble [1 × 5]>
 3 <dbl [5]> <tibble [1 × 5]>
 4 <dbl [5]> <tibble [1 × 5]>
 5 <dbl [5]> <tibble [1 × 5]>
 6 <dbl [5]> <tibble [1 × 5]>
 7 <dbl [5]> <tibble [1 × 5]>
 8 <dbl [5]> <tibble [1 × 5]>
 9 <dbl [5]> <tibble [1 × 5]>
10 <dbl [5]> <tibble [1 × 5]>

The function you want to apply will necessitate, which verb you use. As shown above with sum you can use them nearly interchangeably. However, mean and many other common functions expect a (numeric) vector as its first argument:

class(df[1,])
"data.frame"

sum(df[1,]) # works with data.frame
[1] 4

mean(df[1,]) # does not work with data.frame
[1] NA
Warning message:
In mean.default(df[1, ]) : argument is not numeric or logical: returning NA

class(unname(unlist(df[1,])))
"numeric"

sum(unname(unlist(df[1,]))) # works with numeric vector
[1] 4

mean(unname(unlist(df[1,]))) # works with numeric vector
[1] 0.8

Ignoring the row-wise variant that exists for mean (rowMean) then in this case c_across should be used:

df %>% 
  rowwise() %>% 
  mutate(avg = mean(c_across(x1:x5), na.rm = T)) %>% 
  ungroup()

# A tibble: 10 x 6
      x1    x2    x3    x4    x5   avg
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     1     1     0     1     1   0.8
 2     0     1     1     0     1   0.6
 3     0    NA     0    NA    NA   0  
 4    NA     1     1     1     1   1  
 5     0     1     1     0     1   0.6
 6     1     0     0     0     1   0.4
 7     1    NA    NA    NA    NA   1  
 8    NA    NA    NA     0     1   0.5
 9     0     0     0     0     0   0  
10     1     1     1     1     1   1  

# Does not work
df %>% 
  rowwise() %>% 
  mutate(avg = mean(pick(x1:x5), na.rm = T)) %>% 
  ungroup()

rowSums, rowMeans, etc. can take a numeric data frame as the first argument, which is why they work with pick.

Vicinity answered 29/10, 2020 at 16:54 Comment(1)

Note: before dplyr 1.1.0 across was commonly used as pick is here. However, pick() was created as a replacement for across(.fns = NULL). across should only be used to apply a function(s) across multiple columns, not to return a tibble. – Vicinity 7/7, 2023 at 21:35

If you want to sum certain columns only, I'd use something like this:

library(dplyr)
df=data.frame(
  x1=c(1,0,0,NA,0,1,1,NA,0,1),
  x2=c(1,1,NA,1,1,0,NA,NA,0,1),
  x3=c(0,1,0,1,1,0,NA,NA,0,1),
  x4=c(1,0,NA,1,0,0,NA,0,0,1),
  x5=c(1,1,NA,1,1,1,NA,1,0,1))
df %>% select(x3:x5) %>% rowSums(na.rm=TRUE) -> df$x3x5.total
head(df)

This way you can use dplyr::select's syntax.

Sire answered 24/4, 2016 at 22:36 Comment(2)

I like this approach above others since it does not require coercing NAs to 0 – Minimalist 17/11, 2017 at 16:42

And better than grep because easier to deal with things like x4:x11 – Infamy 3/9, 2019 at 21:31

I would use regular expression matching to sum over variables with certain pattern names. For example:

df <- df %>% mutate(sum1 = rowSums(.[grep("x[3-5]", names(.))], na.rm = TRUE),
                    sum_all = rowSums(.[grep("x", names(.))], na.rm = TRUE))

This way you can create more than one variable as a sum of certain group of variables of your data frame.

Gamekeeper answered 10/2, 2017 at 23:50 Comment(3)

great solution! I was looking for a specific dplyr function doing this in recent releases, but couln't find – Garnes 20/9, 2017 at 14:55

This solution is great. If there are columns you do not want to include you simply need to design the grep() statement to select columns matching a specific pattern. – Woolridge 27/6, 2018 at 20:25

@TrentonHoffman here is the bit deselect columns a specific pattern. just need the - sign: rowSums(.[-grep("x[3-5]", names(.))], na.rm = TRUE) – Detta 11/4, 2019 at 16:27

Using reduce() from purrr is slightly faster than rowSums and definately faster than apply, since you avoid iterating over all the rows and just take advantage of the vectorized operations:

library(purrr)
library(dplyr)
iris %>% mutate(Petal = reduce(select(., starts_with("Petal")), `+`))

See this for timings

Haema answered 5/2, 2019 at 4:17 Comment(4)

I like this but how would you do it when you need na.rm = TRUE – Prussianism 18/3, 2020 at 14:8

@Prussianism I'm not sure I know what you mean. This sums vectors a + b + c, all of the same length. Since each vector may or may not have NA in different locations, you cannot ignore them. This would make the vectors unaligned. If you want to remove NA values you have to do it afterwards with, for instance, drop_na – Haema 19/3, 2020 at 11:31

I ended up doing rowSums(select(., matches("myregex")) , na.rm = TRUE)) because that is what I needed in terms of ignoring NAs. So if the numbers are sum(NA, 5) the results is 5. But you said reduce is better than rowSums so I was wondering if there is a way to use it in this situation? – Prussianism 19/3, 2020 at 12:28

I see. If you want the sum and to ignore NA values definately the rowSums version is probably the best. The main disadvantage is that only rowSums and rowMeans are available (it is slighly slower than reduce, but not by much). If you need to perform another operation (not the sum) then the reduce version is probably the only option. Just avoid using apply in this case. – Haema 20/3, 2020 at 14:24

I encounter this problem often, and the easiest way to do this is to use the apply() function within a mutate command.

library(tidyverse)
df=data.frame(
  x1=c(1,0,0,NA,0,1,1,NA,0,1),
  x2=c(1,1,NA,1,1,0,NA,NA,0,1),
  x3=c(0,1,0,1,1,0,NA,NA,0,1),
  x4=c(1,0,NA,1,0,0,NA,0,0,1),
  x5=c(1,1,NA,1,1,1,NA,1,0,1))

df %>%
  mutate(sum = select(., x1:x5) %>% apply(1, sum, na.rm=TRUE))

Here you could use whatever you want to select the columns using the standard dplyr tricks (e.g. starts_with() or contains()). By doing all the work within a single mutate command, this action can occur anywhere within a dplyr stream of processing steps. Finally, by using the apply() function, you have the flexibility to use whatever summary you need, including your own purpose built summarization function.

Alternatively, if the idea of using a non-tidyverse function is unappealing, then you could gather up the columns, summarize them and finally join the result back to the original data frame.

df <- df %>% mutate( id = 1:n() )   # Need some ID column for this to work

df <- df %>%
  group_by(id) %>%
  gather('Key', 'value', starts_with('x')) %>%
  summarise( Key.Sum = sum(value) ) %>%
  left_join( df, . )

Here I used the starts_with() function to select the columns and calculated the sum and you can do whatever you want with NA values. The downside to this approach is that while it is pretty flexible, it doesn't really fit into a dplyr stream of data cleaning steps.

Imagination answered 26/2, 2018 at 23:6 Comment(2)

Seems silly to use apply when this is what rowSums was designed for. – Vedetta 26/2, 2018 at 23:10

In this case rowSums works really well as does rowMeans, but I always felt a little weird wondering about "What if the thing I need to calculate isn't a sum or a mean?" However, 99% of the time I have to do something like this, it is either a sum or a mean, so maybe the extra bit of flexibility in using the general apply function isn't warrented. – Imagination 26/2, 2018 at 23:17

Benchmarking (almost) all options to sum across columns

As it's difficult to decide among all the interesting answers given by @skd, @LMc, and others, I benchmarked all alternatives which are reasonably long.

The difference to other examples is that I used a larger dataset (10.000 rows) and from a real world dataset (diamonds), so the findings might reflect more the variance of real world data.

The reproducible benchmarking code is:

set.seed(17)
dataset <- diamonds %>% sample_n(1e4)
cols <- c("depth", "table", "x", "y", "z")

sum.explicit <- function() {
  dataset %>%
    mutate(sum.cols = depth + table + x + y + z)
}

sum.rowSums <- function() {
  dataset %>%
    mutate(sum.cols = rowSums(across(cols)))
}

sum.reduce <- function() {
  dataset %>%
    mutate(sum.cols = purrr::reduce(select(., cols), `+`))
}

sum.nest <- function() {
  dataset %>%
  group_by(id = row_number()) %>%
  nest(data = cols) %>%
  mutate(sum.cols = map_dbl(data, sum))
}

# NOTE: across with rowwise doesn't work with all functions!
sum.across <- function() {
  dataset %>%
    rowwise() %>%
    mutate(sum.cols = sum(across(cols)))
}

sum.c_across <- function() {
  dataset %>%
  rowwise() %>%
  mutate(sum.cols = sum(c_across(cols)))
}

sum.apply <- function() {
  dataset %>%
    mutate(sum.cols = select(., cols) %>%
             apply(1, sum, na.rm = TRUE))
}

bench <- microbenchmark::microbenchmark(
  sum.nest(),
  sum.across(),
  sum.c_across(),
  sum.apply(),
  sum.explicit(),
  sum.reduce(),
  sum.rowSums(),
  times = 10
)

bench %>% print(order = 'mean', signif = 3)
Unit: microseconds
           expr     min      lq    mean  median      uq     max neval
 sum.explicit()     796     839    1160     950    1040    3160    10
  sum.rowSums()    1430    1450    1770    1650    1800    2980    10
   sum.reduce()    1650    1700    2090    2000    2140    3300    10
    sum.apply()    9290    9400    9720    9620    9840   11000    10
 sum.c_across()  341000  348000  353000  356000  359000  360000    10
     sum.nest()  793000  827000  854000  843000  871000  945000    10
   sum.across() 4810000 4830000 4880000 4900000 4920000 4940000    10

Visualizing this (without the outlier sum.across) facilitates the comparison:

Conclusion (subjective!)

Despite great readability, nest and rowwise/c_across are not recommendable for larger datasets (> 100.000 rows or repeated actions)
The explicit sum wins because it leverages internally the best the vectorization of the sum function, which is also leveraged by the rowSums but with a little computational overhead
The purrr::reduce is relatively new in the tidyverse (but well known in python), and as Reduce in base R very efficient, thus winning a place among the Top3. Because the explicit form is cumbersome to write, and there are not many vectorized methods other than rowSums/rowMeans, colSums/colMeans, I would recommend for all other functions (e.g. sd) to apply purrr::reduce.

Lockwood answered 30/4, 2022 at 14:31 Comment(2)

This is consistent with my benchmarks as well. Great answer! – Haema 14/12, 2023 at 16:59

Thanks for adding this, but I was hoping you could clarify one thing. reduce works nicely since a lot of infix operators, particularly arithmetic operators, work element-wise due to the structure imposed by data frames (all columns must be the same length). So, reduce works great with `+`, `-`, `*` etc. However, you mention, I would recommend for all other functions (e.g. sd) to apply purrr::reduce. Can you elaborate on that? reduce(df, sd) would not work to get row-wise standard deviations, for example. As a matter of fact it won't get anything since it errors. – Vicinity 13/3 at 17:32

In case you want to sum across columns or rows using a vector but in this case modifying the df instead of add a new column to df.

You can use the sweep function:

library(dplyr)
df=data.frame(
  x1=c(1,0,0,NA,0,1,1,NA,0,1),
  x2=c(1,1,NA,1,1,0,NA,NA,0,1),
  x3=c(0,1,0,1,1,0,NA,NA,0,1),
  x4=c(1,0,NA,1,0,0,NA,0,0,1),
  x5=c(1,1,NA,1,1,1,NA,1,0,1))
> df
   x1 x2 x3 x4 x5
1   1  1  0  1  1
2   0  1  1  0  1
3   0 NA  0 NA NA
4  NA  1  1  1  1
5   0  1  1  0  1
6   1  0  0  0  1
7   1 NA NA NA NA
8  NA NA NA  0  1
9   0  0  0  0  0
10  1  1  1  1  1

Sum (vector + dataframe) in row-wise order:

vector = 1:5
sweep(df, MARGIN=2, vector, `+`)
   x1 x2 x3 x4 x5
1   2  3  3  5  6
2   1  3  4  4  6
3   1 NA  3 NA NA
4  NA  3  4  5  6
5   1  3  4  4  6
6   2  2  3  4  6
7   2 NA NA NA NA
8  NA NA NA  4  6
9   1  2  3  4  5
10  2  3  4  5  6

Sum (vector + dataframe) in column-wise order:

vector <- 1:10  
sweep(df, MARGIN=1, vector, `+`)
   x1 x2 x3 x4 x5
1   2  2  1  2  2
2   2  3  3  2  3
3   3 NA  3 NA NA
4  NA  5  5  5  5
5   5  6  6  5  6
6   7  6  6  6  7
7   8 NA NA NA NA
8  NA NA NA  8  9
9   9  9  9  9  9
10 11 11 11 11 11

This the same to say vector+df

MARGIN = 1 is column-wise
MARGIN = 2 is row-wise.

And Yes. You can use sweep with:

sweep(df, MARGIN=2, vector, `-`)
sweep(df, MARGIN=2, vector, `*`)
sweep(df, MARGIN=2, vector, `/`)
sweep(df, MARGIN=2, vector, `^`)

Another Way is using Reduce with column-wise:

vector = 1:5
.df <- list(df, vector)
Reduce('+', .df)

Mudguard answered 29/3, 2022 at 23:20 Comment(0)

If legibility is not a concern, but speed is, you can construct the desired x1 + x2 + x3 + x4 + x5 using rlang, then evaluate that.

library(dplyr)
library(purrr)
library(rlang)
df=data.frame(
  x1=c(1,0,0,NA,0,1,1,NA,0,1),
  x2=c(1,1,NA,1,1,0,NA,NA,0,1),
  x3=c(0,1,0,1,1,0,NA,NA,0,1),
  x4=c(1,0,NA,1,0,0,NA,0,0,1),
  x5=c(1,1,NA,1,1,1,NA,1,0,1))

# Construct a list of columns to sum, make them symbols
names_symbols <- syms(colnames(df))
# Construct an expression by placing `+` between each symbol
sum_expression <- reduce(names_symbols, ~expr(!!.x + !!.y))
# Evaluate that expression within the mutate
df <- df %>% mutate(sumrow = eval_tidy(!!sum_expression))

# Or, as a one-liner
df <- df %>% mutate(sumrow = eval_tidy(!!reduce(syms(colnames(.)), ~expr(!!.x + !!.y))))

A variant for the benchmark by @Agile Bean

sum.implicit_explicit <- function(){
  dataset %>%
    mutate(sum.cols = eval_tidy(!!purrr::reduce(syms(cols), ~ expr(!!.x + !!.y))))
}

Performance: very little overhead over explicit, much faster than the normal reduce:

Lucilucia answered 25/8, 2023 at 11:47 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++