How do I get standard errors of maximum-likelihood estimates in STAN?
Asked Answered
H

1

10

I am using maximum-likelihood optimization in Stan, but unfortunately the optimizing() function doesn't report standard errors:

> MLb4c <- optimizing(get_stanmodel(fitb4c), data = win.data, init = inits)  
STAN OPTIMIZATION COMMAND (LBFGS)
init = user
save_iterations = 1
init_alpha = 0.001
tol_obj = 1e-012
tol_grad = 1e-008
tol_param = 1e-008
tol_rel_obj = 10000
tol_rel_grad = 1e+007
history_size = 5
seed = 292156286
initial log joint probability = -4038.66
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
      13      -2772.49  9.21091e-005     0.0135987     0.07606      0.9845       15   
Optimization terminated normally: 
  Convergence detected: relative gradient magnitude is below tolerance
> t2 <- proc.time()
> print(t2 - t1)
   user  system elapsed 
   0.11    0.19    0.74 
> 
> MLb4c
$par
       psi      alpha       beta 
 0.9495000  0.4350983 -0.2016895 

$value
[1] -2772.489

> summary(MLb4c)
      Length Class  Mode   
par   3      -none- numeric
value 1      -none- numeric

How do I get the standard errors of the estimates (or confidence interval - quantiles), and possibly p-values?

EDIT: I did as kindly advised by @Ben Goodrich:

> MLb4cH <- optimizing(get_stanmodel(fitb4c), data = win.data, init = inits, hessian = TRUE)

> sqrt(diag(solve(-MLb4cH$hessian)))
       psi      alpha       beta 
0.21138314 0.03251696 0.03270493 

But these "unconstrained" standard errors seem to be very different from the real ones - here as is the output from bayesian fitting using stan():

> print(outb4c, dig = 5)
Inference for Stan model: tmp_stan_model.
3 chains, each with iter=500; warmup=250; thin=1; 
post-warmup draws per chain=250, total post-warmup draws=750.

             mean se_mean      sd        2.5%         25%         50%         75%       97.5% n_eff    Rhat
alpha     0.43594 0.00127 0.03103     0.37426     0.41578     0.43592     0.45633     0.49915   594 1.00176
beta     -0.20262 0.00170 0.03167    -0.26640    -0.22290    -0.20242    -0.18290    -0.13501   345 1.00402
psi       0.94905 0.00047 0.01005     0.92821     0.94308     0.94991     0.95656     0.96632   448 1.00083
lp__  -2776.94451 0.06594 1.15674 -2780.07437 -2777.50643 -2776.67139 -2776.09064 -2775.61263   308 1.01220
Heartbreaker answered 29/11, 2014 at 12:45 Comment(1)
The frequentist standard errors and Bayesian sd seem to match well for your alpha and beta parameters, just not psi.Steadfast
S
15

You can specify the hessian = TRUE argument to the optimizing function, which will return the Hessian as part of the list of output. Thus, you can obtain estimated standard errors via sqrt(diag(solve(-MLb4c$hessian))); however those standard errors pertain to the estimates in the unconstrained space. To obtain estimated standard errors for the parameters in the constrained space, you could either use the delta method or draw many times from a multivariate normal distribution whose mean vector is MLb4c$par and whose variance-covariance is solve(-MLb4c$hessian), convert those draws to the constrained space with the constrain_pars function, and estimate the standard deviation of each column.

Here is some R code you could adapt to your case

# 1: Compile and save a model (make sure to pass the data here)
model <- stan(file="model.stan", data=c("N","K","X","y"), chains = 0, iter = 0)

# 2: Fit that model
fit <- optimizing(object=get_stanmodel(model), as_vector = FALSE,
                   data=c("N","K","X","y"), hessian = TRUE)

# 3: Extract the vector theta_hat and the Hessian for the unconstrained parameters
theta_hat <- unlist(fit$par)
upars <- unconstrain_pars(linear, relist(theta_hat, fit$par))
Hessian <- fit$hessian

# 4: Extract the Cholesky decomposition of the (negative) Hessian and invert
R <- chol(-Hessian)
V <- chol2inv(R)
rownames(V) <- colnames(V) <- colnames(Hessian)

# 5: Produce a matrix with some specified number of simulation draws from a multinormal
SIMS <- 1000
len <- length(theta_hat)
unconstrained <- upars + t(chol(V)) %*% 
  matrix(rnorm(SIMS * len), nrow = len, ncol = SIMS)
theta_sims <- t(apply(unconstrained, 2, FUN = function(upars) {
  unlist(constrain_pars(linear, upars))
}))

# 6: Produce estimated standard errors for the constrained parameters
se <- apply(theta_sims, 2, sd)
Sleight answered 29/11, 2014 at 23:11 Comment(2)
If you're working in a space that might be constrained, I would say it would generally make more sense/be more useful to compute the Normal confidence intervals (+/- 1.96*SE) in the unconstrained space, then back-transform the lower/upper CIs back to the constrained space. But this is turning into a CrossValidated conversation ...Sforza
Thanks Ben Goodrich and Ben Bolker. 1) I tried to use the unconstrained ones but they are completely off, please see my updated question. 2) What does it mean constrained/unconstrained - is it somehow related to the parameter prior? 3) @BenBolkers proposal sounds simpler than what Ben Goodrich is proposing (though I don't get what is constrained/unconstrained space), it would be great if it worked that way.Heartbreaker

© 2022 - 2024 — McMap. All rights reserved.