Why is `speedglm` slower than `glm`?
Asked Answered
D

1

6

I am trying to use speedglm to achieve a faster GLM estimation than glm, but why it is even slower?

set.seed(0)
n=1e3
p=1e3
x=matrix(runif(n*p),nrow=n)
y=sample(0:1,n,replace = T)

ptm <- proc.time()
fit=glm(y~x,family=binomial())
print(proc.time() - ptm)
#   user  system elapsed 
#  10.71    0.07   10.78 

library(speedglm)
ptm <- proc.time()
fit=speedglm(y~x,family=binomial())
print(proc.time() - ptm)
#   user  system elapsed 
#  15.11    0.12   15.25 
Discomposure answered 26/5, 2017 at 18:21 Comment(2)
you should explain the context in which you're working. If you want to do many small GLM fits as fast as possible (a reasonable possibility) you might want to consider using glm.fit() directly ...Depolarize
@BenBolker I am trying fit a logistic regression with 3 million rows and ~1000 columns, want to see how fast it runs in different package.Discomposure
R
19

The efficiency of speedglm over glm, is the way it reduces the n * pmodel matrix to a p * p matrix. However, if you have n = p, there is no effective reduction. What you really want to check, is the n >> p case.


More insight form computational complexity, in each iteration of Fisher scoring.

glm using QR factorization for a n * p matrix takes 2np^2 - (2/3)p^3 FLOP, while speedglm forming matrix cross product of a n * p matrix, followed by a QR factorization of a p * p matrix, involves np^2 + (4/3)p^3 FLOP. So as n >> p, speedglm only has half the computation amount of glm. Furthermore, the blocking, caching strategy used by speedglm gives better use of computer hardware, giving high performance.

If you have n = p, you immediately see that glm takes (4/3)p^3 FLOP, but speedglm takes p^3 + (4/3)p^3 FLOP, more expensive! In fact, the matrix cross product becomes a shear overhead in this case!

Romany answered 26/5, 2017 at 18:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.