I have data of the following format:
gen = function () sample.int(10, replace = TRUE)
x = data.frame(A = gen(), C = gen(), G = gen(), T = gen())
I would now like to attach, to each row, the total sum of all the elements in the row (my actual function is more complex but sum
illustrates the problem).
Without dplyr, I’d write
cbind(x, Sum = apply(x, 1, sum))
Resulting in:
A C G T Sum
1 3 1 6 9 19
2 3 4 3 3 13
3 3 1 10 5 19
4 7 2 1 6 16
…
But it seems surprisingly hard to do this with dplyr.
I’ve tried
x %>% rowwise() %>% mutate(Sum = sum(A : T))
But the result is not the sum of the columns of each row, it’s something unexpected and (to me) inexplicable.
I’ve also tried
x %>% rowwise() %>% mutate(Sum = sum(.))
But here, .
is simply a placeholder for the whole x
. Providing no argument does, unsurprisingly, also not work (results are all 0
). Needless to say, none of these variants works without rowwise()
, either.
(There isn’t really any reason to necessarily do this in dplyr, but (a) I’d like to keep my code as uniform as possible, and jumping between different APIs doesn’t help; and (b) I’m hoping to one day get automatic and free parallelisation of such commands in dplyr.)
library(data.table) ; setDT(x)[, Sum := Reduce("+", .SD)][]
would be of any use... – Lodestar+
is a binary function taking 2 inputs which can then be applied / reduced multiple times whilef
from my answer takes a whole vector at once..) – EncourageReduce
too. Will wait and see what he says. – LodestarReduce
– it calculates the GC bias from a frequency table of codons. Here’s an implementation: gist.github.com/klmr/4898c3eb1a5216850134 – Nikolai