Using sample_weight in Keras for sequence labelling
Asked Answered
E

2

18

I am working on a sequential labeling problem with unbalanced classes and I would like to use sample_weight to resolve the unbalance issue. Basically if I train the model for about 10 epochs, I get great results. If I train for more epochs, val_loss keeps dropping, but I get worse results. I'm guessing the model just detects more of the dominant class to the detriment of the smaller classes.

The model has two inputs, for word embeddings and character embeddings, and the input is one of 7 possible classes from 0 to 6.

With the padding, the shape of my input layer for word embeddings is (3000, 150) and the input layer for word embeddings is (3000, 150, 15). I use a 0.3 split for testing and training data, which means X_train for word embeddings is (2000, 150) and (2000, 150, 15) for char embeddings. y contains the correct class for each word, encoded in a one-hot vector of dimension 7, so its shape is (3000, 150, 7). y is likewise split into a training and testing set. Each input is then fed into a Bidirectional LSTM.

The output is a matrix with one of the 7 categories assigned for each word of the 2000 training samples, so the size is (2000, 150, 7).


At first, I simply tried to define sample_weight as an np.array of length 7 containing the weights for each class:

count = [list(array).index(1) for arrays in y for array in arrays]
count = dict(Counter(count))
count[0] = 0
total = sum([count[key] for key in count])
count = {k: count[key] / total for key in count}
category_weights = np.zeros(7)
for f in count:
    category_weights[f] = count[f]

But I get the following error ValueError: Found a sample_weight array with shape (7,) for an input with shape (2000, 150, 7). sample_weight cannot be broadcast.

Looking at the docs, it looks like I should instead be passing a 2D array with shape (samples, sequence_length). So I create a (3000, 150) array with a concatenation of the weights of every word of each sequence:

weights = []

for sample in y:
    current_weight = []
    for line in sample:
        current_weight.append(frequency[list(line).index(1)])
    weights.append(current_weight)

weights = np.array(weights)

and pass that to the fit function through the sample_weight parameter after having added the sample_weight_mode="temporal" option in compile().

I first got an error telling me the dimension was wrong, however after generating the weights for only the training sample, I end up with a (2000, 150) array that I can use to fit my model.


  • Is this a proper way to define sample_weights or am I doing it all wrong ? I can't say I've noticed any improvements from adding the weights, so I must have missed something.
Eschalot answered 18/1, 2018 at 6:30 Comment(0)
B
27

I think you are confusing sample_weights and class_weights. Checking the docs a bit we can see the differences between them:

sample_weights is used to provide a weight for each training sample. That means that you should pass a 1D array with the same number of elements as your training samples (indicating the weight for each of those samples). In case you are using temporal data you may instead pass a 2D array, enabling you to give weight to each timestep of each sample.

class_weights is used to provide a weight or bias for each output class. This means you should pass a weight for each class that you are trying to classify. Furthermore, this parameter expects a dictionary to be passed to it (not an array, that is why you got that error). For example consider this situation:

class_weight = {0 : 1. , 1: 50.}

In this case (a binary classification problem) you are giving 50 times as much weight (or "relevance") to your samples of class 1 compared to class 0. This way you can compensate for imbalanced datasets. Here is another useful post explaining more about this and other options to consider when dealing with imbalanced datasets.

If I train for more epochs, val_loss keeps dropping, but I get worse results.

Probably you are over-fitting, and something that may be contributing to that is the imbalanced classes your dataset has, as you correctly suspected. Compensating the class weights should help mitigate this, however there may still be other factors that can cause over-fitting that escape the scope of this question/answer (so make sure to watch out for those after solving this question).


Judging by your post, seems to me that what you need is to use class_weight to balance your dataset for training, for which you will need to pass a dictionary indicating the weight ratios between your 7 classes. Consider using sample_weight only if you want to give each sample a custom weight for consideration.

If you want a more detailed comparison between those two consider checking this answer I posted on a related question. Spoiler: sample_weight overrides class_weight, so you have to use one or the other, but not both, so be careful with not mixing them.


Update: As of the moment of this edit (March 27, 2020), looking at the source code of training_utils.standardize_weights() we can see that it now supports both class_weights and sample_weights:

Everything gets normalized to a single sample-wise (or timestep-wise) weight array. If both sample_weights and class_weights are provided, the weights are multiplied together.

Bun answered 18/1, 2018 at 18:16 Comment(11)
Sorry I should probably have mentioned this in my post: it was also originally my understanding that class_weight is the most appropriate parameter for what I am trying to achieve. The count variable defined in my code above as count = {k: count[key] / total for key in count} was meant to be passed as class_weight. However when I tried to do so, I got the following error: ValueError: class_weight not supported for 3+ dimensional targets. After looking around on SO, it seems that for 3d+ output, you have no choice but to use sample_weightEschalot
@darkcygnus did you find the solution or workaround when you hare using fit_generator with class_weight and the loss function in validation returns a significantly different number from training? (github.com/keras-team/keras/issues/4137)Boomkin
Hey @pablo_sci I haven't stumbled upon that issue in order to device a 'workaround', I see you posted on the github issue page, but what makes you think that there is a workaround using sample_weight? Is your issue exactly like the original post there? Have you posted here on SO (so perhaps I could check your post out and answer/help there)?Bun
Thanks @DarkCygnus. I didn't post on SO yet. I guess it should be a workaround since there is a not-so-strange scenario when predicting using fit_generator (for big arrays) + sample_weight (high unbalanced target). The only workaround is to feed the network with balanced class (sampling).Boomkin
@pablo_sci if you happen to post it, along with some details and code samples, feel free to ping me so I can take a look and perhaps help you :) based on what you describe, I think that your generator should be "smart enough" for it to be able to pass along samples and their associated sample_weight, which is 1 per sample and does not depend on the frequencyBun
When sample_weight does not have any influence on the parameter update procedure, why should one use it? Because, as far as I know, these weights just result in a different "total loss". They do not take part in backprop (during which we do not need the "total loss"). or I am wrong?Shriner
@Shriner in the case when you want to give each sample the same importance or weight (or, putting it in other way, you don't have specific samples you want to highlight) then, as you correctly pointed out, using sample_weights would have no use. It would be like providing a bunch of 1's manually.Bun
No my comment was about the case in which you wanna give each training instance a different weight. I really wonder if they affect backprop. If so, how (because these weights just contribute in the "value of total loss", while "total loss value" has no use in backprop). If not, then what they are useful for.Shriner
@Shriner IIRC, what happens backstage is that if a sample has a weight of X, it will make X "copies" of that sample, and train over that, which in turn will result in more gradient updates with that sample. So, it's not like you are modifying the backprop calculation per se; what we are doing is executing that backprop more times for that sample (or class, if using class weights)... we could say this is a form of Data Augmentation. This helps in the (common) case of unbalanced datasets.Bun
Thanks. I already gave +1 for your answer. I could not find any Keras resource in which the backstage of sample_weight is clarified. Only here where the code for weighted loss is there.Shriner
You are welcome :) FWIW, on a related answer of mine I shared a link to that part of the code. Checking the link now seems that it has changed a bit, but seems what you are seeking is in line 470 on the _standardize_user_data method. Specifically lines 625... and now that I'm reading it, it would suggest that sample weights does not longer override class weights (line 629).Bun
H
0

I searched online for the same question and I did have good accuracy improvement after using sample_weight correctly in my case.

I think your understanding is correct and the procedure is also correct. One possible reason that you don't have improvements in your case is that, when you pass in the sample_weight, higher value means higher weight. This means that you cannot use word count directly. You might consider to use the inverted count frequency:

total = sum([count[key] for key in count])
count = {k: count[key] / total for key in count}
for f in count:
category_weights = np.zeros(7)
    category_weights[f] = 1 - count[f]
Hospodar answered 29/7, 2021 at 7:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.