I am running the example code on Bayesian Neural Network implemented using Tensorflow Probability.
My question is about the implementation of the ELBO loss used for variational inference. The ELBO equals the summation of two terms, namely neg_log_likelihood
and kl
implemented in the code. I have difficulty understanding the implementation of the kl
term.
Here is how the model is defined:
with tf.name_scope("bayesian_neural_net", values=[images]):
neural_net = tf.keras.Sequential()
for units in FLAGS.layer_sizes:
layer = tfp.layers.DenseFlipout(units, activation=FLAGS.activation)
neural_net.add(layer)
neural_net.add(tfp.layers.DenseFlipout(10))
logits = neural_net(images)
labels_distribution = tfd.Categorical(logits=logits)
Here is how the 'kl' term defined:
kl = sum(neural_net.losses) / mnist_data.train.num_examples
I am not sure what neural_net.losses
is returning here, since there is no loss function defined for neural_net
. Clearly, there will be some values returned by neural_net.losses
, but I don't know what is the meaning of returned value. Any comments on this?
My guess is the L2 norm, but I am not sure. If that is the case, we are still missing something. According to the VAE paper, appendix B, the authors derived the KL term when the prior is standard normal. It turns out to be pretty close to an L2 norm of the variational parameters except there are additional log variance terms and a constant term. Any comments on this?
neg_log_likelihood = -tf.reduce_mean(input_tensor=labels_distribution.log_prob(labels))
which seems to be more explicitly in line with ELBO loss than softmax xent. Are those equivalent? – Voltz