Currently I stumbled across variational autoencoders and tried to make them work on MNIST using keras. I found a tutorial on github.
My question concerns the following lines of code:
# Build model
vae = Model(x, x_decoded_mean)
# Calculate custom loss
xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
vae_loss = K.mean(xent_loss + kl_loss)
# Compile
vae.add_loss(vae_loss)
vae.compile(optimizer='rmsprop')
Why is add_loss used instead of specifying it as compile option? Something like vae.compile(optimizer='rmsprop', loss=vae_loss)
does not seem to work and throws the following error:
ValueError: The model cannot be compiled because it has no loss to optimize.
What is the difference between this function and a custom loss function, that I can add as an argument for Model.fit()?
Thanks in advance!
P.S.: I know there are several issues concerning this on github, but most of them were open and uncommented. If this has been resolved already, please share the link!
Edit 1
I removed the line which adds the loss to the model and used the loss argument of the compile function. It looks like this now:
# Build model
vae = Model(x, x_decoded_mean)
# Calculate custom loss
xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
vae_loss = K.mean(xent_loss + kl_loss)
# Compile
vae.compile(optimizer='rmsprop', loss=vae_loss)
This throws an TypeError:
TypeError: Using a 'tf.Tensor' as a Python 'bool' is not allowed. Use 'if t is not None:' instead of 'if t:' to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.
Edit 2
Thanks to @MarioZ's efforts, I was able to figure out a workaround for this.
# Build model
vae = Model(x, x_decoded_mean)
# Calculate custom loss in separate function
def vae_loss(x, x_decoded_mean):
xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
vae_loss = K.mean(xent_loss + kl_loss)
return vae_loss
# Compile
vae.compile(optimizer='rmsprop', loss=vae_loss)
...
vae.fit(x_train,
x_train, # <-- did not need this previously
shuffle=True,
epochs=epochs,
batch_size=batch_size,
validation_data=(x_test, x_test)) # <-- worked with (x_test, None) before
For some strange reason, I had to explicitly specify y and y_test while fitting the model. Originally, I didn't need to do this. The produced samples seem reasonable to me.
Although I could resolve this, I still don't know what the differences and disadvantages of these two methods are (other than needing a different syntax). Can someone give me more insight?