Multilabel Text Classification using TensorFlow

Asked 15/2, 2016 at 1:10 Answered 13/9, 2016 at 14:47

Solved python tensorflow text-classification multilabel-classification

The text data is organized as vector with 20,000 elements, like [2, 1, 0, 0, 5, ...., 0]. i-th element indicates the frequency of the i-th word in a text.

The ground truth label data is also represented as vector with 4,000 elements, like [0, 0, 1, 0, 1, ...., 0]. i-th element indicates whether the i-th label is a positive label for a text. The number of labels for a text differs depending on texts.

I have a code for single-label text classification.

How can I edit the following code for multilabel text classification?

Especially, I would like to know following points.

How to compute accuracy using TensorFlow.
How to set a threshold which judges whether a label is positive or negative. For instance, if the output is [0.80, 0.43, 0.21, 0.01, 0.32] and the ground truth is [1, 1, 0, 0, 1], the labels with scores over 0.25 should be judged as positive.

Thank you.

import tensorflow as tf

# hidden Layer
class HiddenLayer(object):
    def __init__(self, input, n_in, n_out):
        self.input = input

        w_h = tf.Variable(tf.random_normal([n_in, n_out],mean = 0.0,stddev = 0.05))
        b_h = tf.Variable(tf.zeros([n_out]))

        self.w = w_h
        self.b = b_h
        self.params = [self.w, self.b]

    def output(self):
        linarg = tf.matmul(self.input, self.w) + self.b
        self.output = tf.nn.relu(linarg)

        return self.output

# output Layer
class OutputLayer(object):
    def __init__(self, input, n_in, n_out):
        self.input = input

        w_o = tf.Variable(tf.random_normal([n_in, n_out], mean = 0.0, stddev = 0.05))
        b_o = tf.Variable(tf.zeros([n_out]))

        self.w = w_o
        self.b = b_o
        self.params = [self.w, self.b]

    def output(self):
        linarg = tf.matmul(self.input, self.w) + self.b
        self.output = tf.nn.relu(linarg)

        return self.output

# model
def model():
    h_layer = HiddenLayer(input = x, n_in = 20000, n_out = 1000)
    o_layer = OutputLayer(input = h_layer.output(), n_in = 1000, n_out = 4000)

    # loss function
    out = o_layer.output()
    cross_entropy = -tf.reduce_sum(y_*tf.log(out + 1e-9), name='xentropy')    

    # regularization
    l2 = (tf.nn.l2_loss(h_layer.w) + tf.nn.l2_loss(o_layer.w))
    lambda_2 = 0.01

    # compute loss
    loss = cross_entropy + lambda_2 * l2

    # compute accuracy for single label classification task
    correct_pred = tf.equal(tf.argmax(out, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, "float"))

    return loss, accuracy

Perrie answered 15/2, 2016 at 1:10 Comment(4)

I think there might be a better loss function to use besides cross-entropy. – Mammillate 16/2, 2016 at 23:43

There are many different measures of accuracy for a multilabel classification problem: one-error accuracy, rank loss, mean average precision, etc. I'm still learning TensorFlow myself and haven't managed to correctly implement any of them yet. But perhaps this paper will help you: arxiv.org/pdf/1312.5419v3.pdf Let me know if you make any progress! – Apical 24/2, 2016 at 13:51

For a better idea of accuracy consider calculating precision and recall. – Lupercalia 15/4, 2017 at 2:11

@Perrie what is y_ I don't see it defined – Riga 29/8, 2018 at 11:17

Change relu to sigmoid of output layer. Modify cross entropy loss to explicit mathematical formula of sigmoid cross entropy loss (explicit loss was working in my case/version of tensorflow )

import tensorflow as tf

# hidden Layer
class HiddenLayer(object):
    def __init__(self, input, n_in, n_out):
        self.input = input

        w_h = tf.Variable(tf.random_normal([n_in, n_out],mean = 0.0,stddev = 0.05))
        b_h = tf.Variable(tf.zeros([n_out]))

        self.w = w_h
        self.b = b_h
        self.params = [self.w, self.b]

    def output(self):
        linarg = tf.matmul(self.input, self.w) + self.b
        self.output = tf.nn.relu(linarg)

        return self.output

# output Layer
class OutputLayer(object):
    def __init__(self, input, n_in, n_out):
        self.input = input

        w_o = tf.Variable(tf.random_normal([n_in, n_out], mean = 0.0, stddev = 0.05))
        b_o = tf.Variable(tf.zeros([n_out]))

        self.w = w_o
        self.b = b_o
        self.params = [self.w, self.b]

    def output(self):
        linarg = tf.matmul(self.input, self.w) + self.b
        #changed relu to sigmoid
        self.output = tf.nn.sigmoid(linarg)

        return self.output

# model
def model():
    h_layer = HiddenLayer(input = x, n_in = 20000, n_out = 1000)
    o_layer = OutputLayer(input = h_layer.output(), n_in = 1000, n_out = 4000)

    # loss function
    out = o_layer.output()
    # modified cross entropy to explicit mathematical formula of sigmoid cross entropy loss
    cross_entropy = -tf.reduce_sum( (  (y_*tf.log(out + 1e-9)) + ((1-y_) * tf.log(1 - out + 1e-9)) )  , name='xentropy' )    

    # regularization
    l2 = (tf.nn.l2_loss(h_layer.w) + tf.nn.l2_loss(o_layer.w))
    lambda_2 = 0.01

    # compute loss
    loss = cross_entropy + lambda_2 * l2

    # compute accuracy for single label classification task
    correct_pred = tf.equal(tf.argmax(out, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, "float"))

    return loss, accuracy

Anserine answered 13/9, 2016 at 14:47 Comment(0)

You have to use variations of cross entropy function in other to support multilabel classification. In case you have less than one thousand of ouputs you should use sigmoid_cross_entropy_with_logits, in your case that you have 4000 outputs you may consider candidate sampling as it is faster than the previous.

How to compute accuracy using TensorFlow.

This depends on your problem and what you want to achieve. If you don't want to miss any object in an image then if the classifier get all right but one, then you should consider the whole image an error. You can also consider that an object missed or missclassiffied is an error. The latter I think it supported by sigmoid_cross_entropy_with_logits.

How to set a threshold which judges whether a label is positive or negative. For instance, if the output is [0.80, 0.43, 0.21, 0.01, 0.32] and the ground truth is [1, 1, 0, 0, 1], the labels with scores over 0.25 should be judged as positive.

Threshold is one way to go, you have to decided which one. But that is some kind of hack, not real multilable classification. For that you need the previous functions I said before.

Pervade answered 5/5, 2016 at 13:22 Comment(9)

I don't know why people suggest 'sigmoid_cross_entropy_with_logits'. If it is what its name suggests i.e -Y*ln(sigmoid(logits)) . Then it will minimize the loss by giving high probability to every class and infact it was giving that in my case. – Anserine 13/9, 2016 at 12:59

this function doesn't return a probability. And I don't see how it will minimize the loss by giving a high value. If you set to 1 to your classes and 0 when the class is not present then the network gives values close to 0 when the object is not in the image and values close to 1 or bigger (even 2 o 3) if the object is in the image. I am using it and works pretty well. – Pervade 13/9, 2016 at 17:52

It will minimize the loss by giving a high value to every class because there is no penalty(or 0 loss) for giving high value to classes which are labelled 0. So one needs to modify cross entropy loss with binary cross entropy (y * ln(sigmoid(logits)) + 1-y * ln(sigmoid(1-logits))) . sigmoid_cross_entropy_with_logits doesn't implement binary cross entropy internally. I am surprised why is it working in your case, are you using theano etc – Anserine 14/9, 2016 at 4:20

I think you are wrong with the maths. It is: y * ln(sigmoid(logits)) + (1-y) * ln(1-sigmoid(logits)) So: logits=0, y=0 => 0 ; logits=1, y=1 => 0 ; logits=1, y=0 => 1.3 ; logits=0, y=1 => 1.3 ; You can plot the function in google an play with the numbers. Just search for y*-ln (1 / ( 1 + e^-x)) +(1-y)*-ln (1-1 / ( 1 + e^-x)) – Pervade 14/9, 2016 at 12:31

My bad, Ignore my above math. Here what I was using, which worked for me, -tf.reduce_mean(tf.mul(y,tf.log(tf.nn.sigmoid(logits) + 1e-9)) + tf.mul(1-y,tf.log(1 - tf.nn.sigmoid(logits) + 1e-9))) . This worked and what you you suggested didn't work, let me know if I am wrong with my argument – Anserine 14/9, 2016 at 13:18

It might be the version of tensorflow that you are using. The equations are almost the same (you added a small number to avoid 0s and in tensorflow they use a max function). You argument is wrong, if you replace the values in the equation you get errors when logits and y doesn't match and 0 when they are the same. So I don't know why it is not working for you, but the equations are ok. – Pervade 14/9, 2016 at 16:43

No doubt if I replace the values in my equation I get errors when logits and y doesn't match and 0 when they are the same.No doubt about my loss defination. But in tensorflow's 'sigmoid_cross_entropy_with_logits'. loss = -Y*ln(sigmoid(logits)) . Please justify this loss not the loss which I used – Anserine 15/9, 2016 at 5:45

I was talking about TF, I wrote the equation and test it. Do it yourself, it works. I didn't check your equations. Tell me with which values the equatiosn of TF doesn't work – Pervade 15/9, 2016 at 12:43

What you want to say is that it doesn't work for you. It is working for me fine since a couple of months. sigmoid_cross_entropy_with_logits doesn't use the equation you said, it uses the one I wrote before (it is in the docts of tensorflow): y * ln(sigmoid(logits)) + (1-y) * ln(1-sigmoid(logits)) – Pervade 15/9, 2016 at 16:39

Recommended topics

Hot tags