How does Keras handle multilabel classification?

Asked 24/5, 2017 at 17:10 Answered 9/3, 2022 at 11:2

python neural-network keras multilabel-classification

I am unsure how to interpret the default behavior of Keras in the following situation:

My Y (ground truth) was set up using scikit-learn's MultilabelBinarizer().

Therefore, to give a random example, one row of my y column is one-hot encoded as such: [0,0,0,1,0,1,0,0,0,0,1].

So I have 11 classes that could be predicted, and more than one can be true; hence the multilabel nature of the problem. There are three labels for this particular sample.

I train the model as I would for a non multilabel problem (business as usual) and I get no errors.

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD

model = Sequential()
model.add(Dense(5000, activation='relu', input_dim=X_train.shape[1]))
model.add(Dropout(0.1))
model.add(Dense(600, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(y_train.shape[1], activation='softmax'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy',])

model.fit(X_train, y_train,epochs=5,batch_size=2000)

score = model.evaluate(X_test, y_test, batch_size=2000)
score

What does Keras do when it encounters my y_train and sees that it is "multi" one-hot encoded, meaning there is more than one 'one' present in each row of y_train? Basically, does Keras automatically perform multilabel classification? Any differences in the interpretation of the scoring metrics?

Feasible answered 24/5, 2017 at 17:10 Comment(0)

135

In short

Don't use softmax.

Use sigmoid for activation of your output layer.

Use binary_crossentropy for loss function.

Use predict for evaluation.

Why

In softmax when increasing score for one label, all others are lowered (it's a probability distribution). You don't want that when you have multiple labels.

Complete Code

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation
from tensorflow.keras.optimizers import SGD

model = Sequential()
model.add(Dense(5000, activation='relu', input_dim=X_train.shape[1]))
model.add(Dropout(0.1))
model.add(Dense(600, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(y_train.shape[1], activation='sigmoid'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy',
              optimizer=sgd)

model.fit(X_train, y_train, epochs=5, batch_size=2000)

preds = model.predict(X_test)
preds[preds>=0.5] = 1
preds[preds<0.5] = 0
# score = compare preds and y_test

Carte answered 24/5, 2017 at 18:11 Comment(8)

Thanks, so you are saying to decompose my multilabel problem into many binary classification problems? How does Keras know that I am giving it a multilabel classification task? – Feasible 25/5, 2017 at 2:29

Yes, thats right. Keras doesn't really have to know. By using sigmoid and binary_crossentropy, the labels will be improved individually, and that's how you want for multilabel task, right? – Carte 26/5, 2017 at 5:38

how will you get the classes which have 1 – Janeyjangle 8/12, 2017 at 12:12

I am lost, then how come the Keras and TF tutorials use softmax and it seems to work well? tensorflow.org/tutorials/keras/basic_classification – Croquette 14/12, 2018 at 8:15

@HerrvonWurst This is because the problem that you linked to, the job of the classifier is to place the images in one class only, whereas in the question asked, the classifier has to assign multiple classes to an input – Evieevil 19/12, 2018 at 8:43

Why not use binary_accuracy as the metric here? – Capuchin 21/10, 2019 at 8:4

What would be the best way to "compare preds and y_test"? Also, using the above I seem to get lots of nan values in model.predict, how would this be the case? – Willing 18/7, 2020 at 12:45

How it is possible for a multi label text classification , you recommend binary_crossentropy? , we binary ,. it is obvious what it does.. binary classification – Fantinlatour 7/4, 2021 at 20:5

Answer from Keras Documentation

I am quoting from keras document itself.

They have used output layer as dense layer with sigmoid activation. Means they also treat multi-label classification as multi-binary classification with binary cross entropy loss

Following is model created in Keras documentation

shallow_mlp_model = keras.Sequential( [ layers.Dense(512, activation="relu"), layers.Dense(256, activation="relu"), layers.Dense(lookup.vocabulary_size(), activation="sigmoid"), ] # More on why "sigmoid" has been used here in a moment.

Keras doc link:: https://keras.io/examples/nlp/multi_label_classification/

Giacometti answered 9/3, 2022 at 11:2 Comment(0)

In short

Why

Complete Code

Recommended topics

Hot tags