The code you proposed seems wrong to me.
The loss should be multiplied by the weight, I agree.
But if you multiply the logit by the class weights, you end with:
weights[class] * -x[class] + log( \sum_j exp(x[j] * weights[class]) )
The second term is not equal to:
weights[class] * log(\sum_j exp(x[j]))
To show this, we can be rewrite the latter as:
log( (\sum_j exp(x[j]) ^ weights[class] )
So here is the code I'm proposing:
ratio = 31.0 / (500.0 + 31.0)
class_weight = tf.constant([[ratio, 1.0 - ratio]])
logits = ... # shape [batch_size, 2]
weight_per_label = tf.transpose( tf.matmul(labels
, tf.transpose(class_weight)) ) #shape [1, batch_size]
# this is the weight for each datapoint, depending on its label
xent = tf.mul(weight_per_label
, tf.nn.softmax_cross_entropy_with_logits(logits, labels, name="xent_raw") #shape [1, batch_size]
loss = tf.reduce_mean(xent) #shape 1
\sum_
- can you please explain that? It seems to be latex code; does that work in Python? – Orthogenetic