I'm using R to run a Monte-Carlo simulation studying the performance of panel data estimators. Because I'll be running a large number of trials, I need to get at least decent performance from my code.
Using Rprof
on 10 trials of my simulation shows that a significant portion of time is spent in calls to summary.plm
. The first few lines of Rprofsummary
are provided below:
$by.total
total.time total.pct self.time self.pct
"trial" 54.48 100.0 0.00 0.0
"coefs" 53.90 98.9 0.06 0.1
"model.matrix" 36.72 67.4 0.10 0.2
"model.matrix.pFormula" 35.98 66.0 0.06 0.1
"summary" 33.82 62.1 0.00 0.0
"summary.plm" 33.80 62.0 0.08 0.1
"r.squared" 29.00 53.2 0.02 0.0
"FUN" 24.84 45.6 7.52 13.8
I'm calling summary
in my code because I need to get the standard errors of the coefficient estimates as well as the coefficients themselves (which I could get from just the plm object). My call looks like
regression <- plm(g ~ y0 + Xit, data=panel_data, model=model, index=c("country","period"))
coefficients_estimated <- summary(regression)$coefficients[,"Estimate"]
ses_estimated <- summary(regression)$coefficients[,"Std. Error"]
I have a nagging feeling that this is a huge waste of cpu time, but I don't know enough about how R does things to avoid calling summary. I'd appreciate any information on what's going on behind the scenes here, or some way of reducing the time it takes for this to excecute.