Now I would also like to add weights to the model that correspond to the accuracy of responses. Therefore, for each response vector I would need also different weight vector. However, the lm
allows to enter the weights only as a vector not matrix. So I could not enter the weights in batch and would have to run lm
for every response separately. This way the calculations would become much slower.
As explained in Fitting a linear model with multiple LHS, efficiency of "mlm" demands a shared model matrix for all LHS responses. However, weighted regression gives no reuse of model matrix, as for a different set of weights, both response y
and model matrix X
need be rescaled. Have a read on R: lm() result differs when using weights
argument and when using manually reweighted data to see how weighted regression works.
Is there any way run these type of models in batch mode, without calling lm
repeatedly?
It depends on what you want. If you need a full lmObject
, then you have to call lm
every time. If you only want coefficients, you may use .lm.fit
. The 2nd link above has demonstrated the use of lm.fit
, while the use of .lm.fit
is almost identical. A simple template may be the following:
## weighted mlm, by specifying matrix directly
## `xmat`: non-weighted model matrix, manually created from `model.matrix`
## `ymat`: non-weighted response matrix
## `wmat`: matrix of weights
## all matrices must have the same number of rows (not checked)
## `ymat` and `wmat` must have the same number of columns (not checked)
## no `NA` values in any where is allowed (not checked)
## all elements of `wmat` must be strictly positive (not checked)
wmlm <- function (xmat, ymat, wmat) {
N <- ncol(ymat)
wmlmList <- vector("list", length = N)
for (j in 1:N) {
rw <- sqrt(wmat[, j])
wmlmList[[j]] <- .lm.fit(rw * xmat, rw * ymat[, j])
}
return(wmlmList)
}
Consider a simple example of its use:
## a toy dataset of 200 data with 3 numerical variables and 1 factor variable
dat <- data.frame(x1 = rnorm(200), x2 = rnorm(200), x3 = rnorm(200), f = gl(5, 40, labels = letters[1:5]))
## consider a model `~ x1 + poly(x3, 3) + x2 * f`
## we construct the non-weighted model matrix
xmat <- model.matrix (~ x1 + poly(x3, 3) + x2 * f, dat)
## now let's assume we have 100 model responses as well as 100 sets of weights
ymat <- matrix(rnorm(200 * 100), 200)
wmat <- matrix(runif(200 * 100), 200)
## Let's call `wmlm`:
fit <- wmlm (xmat, ymat, wmat)
.lm.fit
returns critical information for further model inference, and a complete lmObject
would inherit most of these entries.
## take the first fitted model as an example
str(fit[[1]])
#$ qr : num [1:200, 1:14] -10.4116 0.061 0.0828 0.0757 0.0698 ...
# ..- attr(*, "assign")= int [1:14] 0 1 2 2 2 3 4 4 4 4 ...
# ..- attr(*, "contrasts")=List of 1
# .. ..$ f: chr "contr.treatment"
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:200] "1" "2" "3" "4" ...
# .. ..$ : chr [1:14] "(Intercept)" "x1" "poly(x3, 3)1" "poly(x3, 3)2" ...
#$ coefficients: num [1:14] 0.1184 -0.0506 0.3032 0.1643 0.4269 ...
#$ residuals : num [1:200] -0.7311 -0.0795 -0.2495 0.4097 0.0495 ...
#$ effects : num [1:200] -0.351 -0.36 0.145 0.182 0.291 ...
#$ rank : int 14
#$ pivot : int [1:14] 1 2 3 4 5 6 7 8 9 10 ...
#$ qraux : num [1:14] 1.06 1.13 1.07 1.05 1.01 ...
#$ tol : num 1e-07
#$ pivoted : logi FALSE
Result of .lm.fit
has no supported generic functions like summary
, anova
, predict
, plot
, etc. But inference of a linear model is easy, so it is straightforward to compute yourself (provided you know the theory behind):
- t-statistic table for regression coefficients (from
$qr
);
- F-statistic and ANOVA table (from
$effects
);
- residual standard error, R-squared and adjusted R-squared (from
$residulas
and $rank
).
Finally, I offer a benchmark:
library(microbenchmark)
microbenchmark(wmlm = {wmlm (xmat, ymat, wmat);},
lm = {for (i in 1:ncol(ymat))
lm(ymat[, i] ~ x1 + poly(x3, 3) + x2 * f, dat, weights = wmat[, i]);} )
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# wmlm 20.84512 23.02756 27.29539 24.49314 25.9027 79.8894 100 a
# lm 400.53000 405.10622 430.09787 414.42152 442.2640 535.9144 100 b
So 17.25 times boost is seen (based on median time).
lm.fit
? – Ancelinlm.wfit
you seez <- .Call(C_Cdqrls, x * wts, y * wts, tol)
. That meansx
andy
are both transformed using the same weights. That`s not possible, if you want different weights for your multiple responses. – Fantasia