GAN originally proposed by IJ Goodfellow uses following loss function,
D_loss = - log[D(X)] - log[1 - D(G(Z))]
G_loss = - log[D(G(Z))]
So, discriminator tries to minimize D_loss and generator tries to minimize G_loss, where X and Z are training input and noise input respectively. D(.) and G(.) are map for discriminator and generator neural networks respectively.
As original paper says, when GAN is trained for several steps it reaches at a point where neither generator nor discriminator can improve and D(Y) is 0.5 everywhere, Y is some input to the discriminator. In this case, when GAN is sufficiently trained to this point,
D_loss = - log(0.5) - log(1 - 0.5) = 0.693 + 0.693 = 1.386
G_loss = - log(0.5) = 0.693
So, why can we not use D_loss and G_loss values as a metric for evaluating GAN?
If two loss functions deviate away from these ideal values then GAN surely needs to be trained well or architecture needs to designed well. As theorem 1 in the original paper discusses that these are the optimal values for the D_loss and G_loss but then why can't these be used as evaluation metric?