I have a matrix in which each row is a sample from a distribution. I want to do a rolling comparison of the distributions using ks.test
and save the test statistic in each case. The simplest way to implement this conceptually is with a loop:
set.seed(1942)
mt <- rbind(rnorm(5), rnorm(5), rnorm(5), rnorm(5))
results <- matrix(as.numeric(rep(NA, nrow(mt))))
for (i in 2 : nrow(mt)) {
results[i] <- ks.test(x = mt[i - 1, ], y = mt[i, ])$statistic
}
However, my real data has ~400 columns and ~300,000 rows for a single example, and I have a lot of examples. So I'd like this to be fast. The Kolmogorov-Smirnov test isn't all that mathematically complicated, so if the answer is "implement it in Rcpp
," I'll grudgingly accept that, but I'd be somewhat surprised -- it's already very fast to compute on a single pair in R.
Methods I've tried but have been unable to get working: dplyr
using rowwise/do/lag
, zoo
using rollapply
(which is what I use to generate the distributions), and populating a data.table
in a loop (edit: this one works, but it's still slow).
KernSmooth
package?ks.test
is in thestats
package. – Miyamoto