What does the property losses of the Bayesian layers of TensorFlow Probability represent?

I am running the example code on Bayesian Neural Network implemented using Tensorflow Probability.

My question is about the implementation of the ELBO loss used for variational inference. The ELBO equals the summation of two terms, namely neg_log_likelihood and kl implemented in the code. I have difficulty understanding the implementation of the kl term.

Here is how the model is defined:

with tf.name_scope("bayesian_neural_net", values=[images]):
  neural_net = tf.keras.Sequential()
  for units in FLAGS.layer_sizes:
    layer = tfp.layers.DenseFlipout(units, activation=FLAGS.activation)
    neural_net.add(layer)
  neural_net.add(tfp.layers.DenseFlipout(10))
  logits = neural_net(images)
  labels_distribution = tfd.Categorical(logits=logits)

Here is how the 'kl' term defined:

kl = sum(neural_net.losses) / mnist_data.train.num_examples

I am not sure what neural_net.losses is returning here, since there is no loss function defined for neural_net. Clearly, there will be some values returned by neural_net.losses, but I don't know what is the meaning of returned value. Any comments on this?

My guess is the L2 norm, but I am not sure. If that is the case, we are still missing something. According to the VAE paper, appendix B, the authors derived the KL term when the prior is standard normal. It turns out to be pretty close to an L2 norm of the variational parameters except there are additional log variance terms and a constant term. Any comments on this?

import tensorflow_probability as tfp model = tf.keras.Sequential([ tfp.layers.DenseFlipout(512, activation=tf.nn.relu), tfp.layers.DenseFlipout(10), ]) logits = model(features) neg_log_likelihood = tf.nn.softmax_cross_entropy_with_logits( labels=labels, logits=logits) kl = sum(model.losses) loss = neg_log_likelihood + kl train_op = tf.train.AdamOptimizer().minimize(loss)

Recommended topics

Hot tags