Although I've figured this out before, I still find myself searching (and unable to find) this syntax on stackoverflow, so...
I want to do row wise operations on a subset of the data.table's columns, using .SD
and .SDcols
. I can never remember if the operations need an sapply
, lapply
, or if the belong inside the brackets of .SD
.
As an example, say you have data for 10 students over two quarters. In both quarters they have two exams and a final exam. How would you take a straight average of the columns starting with q1?
Since overly trivial examples are annoying, I'd also like to calculate a weighted average for columns starting with q2? (weights = 25% 25% and 50% for q2)
library(data.table)
set.seed(10)
dt <- data.table(id = paste0("student_", sprintf("%02.f" , 1:10)),
q1_exam1 = round(rnorm(10, .78, .05), 2),
q1_exam2 = round(rnorm(10, .68, .02), 2),
q1_final = round(rnorm(10, .88, .08), 2),
q2_exam1 = round(rnorm(10, .78, .05), 2),
q2_exam2 = round(rnorm(10, .68, .10), 2),
q2_final = round(rnorm(10, .88, .04), 2))
dt
# > dt
# id q1_exam1 q1_exam2 q1_final q2_exam1 q2_exam2 q2_final
# 1: student_01 0.78 0.70 0.83 0.69 0.79 0.86
# 2: student_02 0.77 0.70 0.71 0.78 0.60 0.87
# 3: student_03 0.71 0.68 0.83 0.83 0.60 0.93
# 4: student_04 0.75 0.70 0.71 0.79 0.76 0.97
# 5: student_05 0.79 0.69 0.78 0.71 0.58 0.90
# 6: student_06 0.80 0.68 0.85 0.71 0.68 0.91
# 7: student_07 0.72 0.66 0.82 0.80 0.70 0.84
# 8: student_08 0.76 0.68 0.81 0.69 0.65 0.90
# 9: student_09 0.70 0.70 0.87 0.76 0.61 0.85
# 10: student_10 0.77 0.69 0.86 0.75 0.75 0.89
data.table
, you can create an index variable with.I
and use that in theby =
part. – Arlie.I
like this: #16574495 ? (I didn't initially realize this example might be relevant, thanks Frank for showing me this on github) – Ephialtesby = 1:nrow(dt)
. Or depending on the operation, you might be able to useReduce()
inj
. – Skyway