Loss does not decrease during training (Word2Vec, Gensim)

Asked 27/8, 2018 at 11:48 Answered 18/12, 2019 at 20:52

What can cause loss from model.get_latest_training_loss() increase on each epoch?

Code, used for training:

class EpochSaver(CallbackAny2Vec):
    '''Callback to save model after each epoch and show training parameters '''

    def __init__(self, savedir):
        self.savedir = savedir
        self.epoch = 0

        os.makedirs(self.savedir, exist_ok=True)

    def on_epoch_end(self, model):
        savepath = os.path.join(self.savedir, "model_neg{}_epoch.gz".format(self.epoch))
        model.save(savepath)
        print(
            "Epoch saved: {}".format(self.epoch + 1),
            "Start next epoch ... ", sep="\n"
            )
        if os.path.isfile(os.path.join(self.savedir, "model_neg{}_epoch.gz".format(self.epoch - 1))):
            print("Previous model deleted ")
            os.remove(os.path.join(self.savedir, "model_neg{}_epoch.gz".format(self.epoch - 1))) 
        self.epoch += 1
        print("Model loss:", model.get_latest_training_loss())

    def train():

        ### Initialize model ###
        print("Start training Word2Vec model")

        workers = multiprocessing.cpu_count()/2

        model = Word2Vec(
            DocIter(),
            size=300, alpha=0.03, min_alpha=0.00025, iter=20,
            min_count=10, hs=0, negative=10, workers=workers,
            window=10, callbacks=[EpochSaver("./checkpoints")], 
            compute_loss=True
    )

Output:

Losses from epochs (1 to 20):

Model loss: 745896.8125
Model loss: 1403872.0
Model loss: 2022238.875
Model loss: 2552509.0
Model loss: 3065454.0
Model loss: 3549122.0
Model loss: 4096209.75
Model loss: 4615430.0
Model loss: 5103492.5
Model loss: 5570137.5
Model loss: 5955891.0
Model loss: 6395258.0
Model loss: 6845765.0
Model loss: 7260698.5
Model loss: 7712688.0
Model loss: 8144109.5
Model loss: 8542560.0
Model loss: 8903244.0
Model loss: 9280568.0
Model loss: 9676936.0

What am I doing wrong?

Language arabian. As input from DocIter - list with tokens.

Betteann answered 27/8, 2018 at 11:48 Comment(1)

Please, comment you downvoting! – Betteann 28/8, 2018 at 14:19

Up through gensim 3.6.0, the loss value reported may not be very sensible, only resetting the tally each call to train(), rather than each internal epoch. There are some fixes forthcoming in this issue:

https://github.com/RaRe-Technologies/gensim/pull/2135

In the meantime, the difference between the previous value, and the latest, may be more meaningful. In that case, your data suggest the 1st epoch had a total loss of 745896, while the last had (9676936-9280568=) 396,368 – which may indicate the kind of progress hoped-for.

Sharpeyed answered 29/8, 2018 at 0:40 Comment(7)

Thank you for your answer! But the loss is bigger on the last stage, and as far as I understood, I only take into account the difference and should interpret it like progress, shouldn't I? And can I receive a more appropriate loss, if I'll call train() instead of just passing "iter=smth" to my model? – Betteann 29/8, 2018 at 9:23

Calling train() multiple times, if you aren't already deeply familiar with the code's internal operations, often goes wrong... so I don't recommend that. (It's fragile & error-prone, & most online examples I see are wrong.) The model is generally trying to lower its loss after each training-example – but it may never get objectively very good at its internal word predictions, and doesn't need to for the word-vectors to still be useful for downstream tasks. (A model with lower loss doesn't necessarily give better word-vectors than one with higher!) – Sharpeyed 29/8, 2018 at 19:5

And it's natural for the loss-through-a-full-epoch to bounce higher and lower for a while, then eventually stop improving, upon model "convergence". That means it's as roughly as good as a model of that complexity can get, for a certain training corpus, and further epochs will just jitter the overall loss a little up and down, but no longer reliably drive it lower. So you shouldn't worry too much about the last epoch-to-epoch delta. Why is loss of interest to you? – Sharpeyed 29/8, 2018 at 19:7

(Separately, changing the default alpha/min_alpha isn't something I'd usually tinker with, unless sure of the reasons why and able to verify the changes are improving the results on downstream tasks.) – Sharpeyed 29/8, 2018 at 19:8

@DashaOrgunova, did you try to use early stopping mechanism in callback function? – Laudatory 3/10, 2018 at 21:17

@Laudatory no, I did not! – Betteann 4/10, 2018 at 8:11

@gojomo, I've bumbed the gensim version, if you don't mind. (Wasn't able to edit the version number in-place) – Abagail 24/12, 2018 at 14:36

As proposed by gojomo you can calculate the difference of loss in the callback function:

from gensim.models.callbacks import CallbackAny2Vec
from gensim.models import Word2Vec

# init callback class
class callback(CallbackAny2Vec):
    """
    Callback to print loss after each epoch
    """
    def __init__(self):
        self.epoch = 0

    def on_epoch_end(self, model):
        loss = model.get_latest_training_loss()
        if self.epoch == 0:
            print('Loss after epoch {}: {}'.format(self.epoch, loss))
        else:
            print('Loss after epoch {}: {}'.format(self.epoch, loss- self.loss_previous_step))
        self.epoch += 1
        self.loss_previous_step = loss

For the training of your model and add computer_loss = True and callbacks=[callback()] in the word2vec train method:

# init word2vec class
w2v_model = Word2Vec(min_count=20, 
                     window=12 
                     size=100, 
                     workers=2)
# build vovab
w2v_model.build_vocab(sentences)
  
# train the w2v model
w2v_model.train(senteces, 
                total_examples=w2v_model.corpus_count, 
                epochs=10, 
                report_delay=1,
                compute_loss = True, # set compute_loss = True
                callbacks=[callback()]) # add the callback class

# save the word2vec model
w2v_model.save('word2vec.model')

This will output something like this:

Loss after epoch 0: 4448638.5

Loss after epoch 1: 3283735.5

Loss after epoch 2: 2826198.0

Loss after epoch 3: 2680974.0

Loss after epoch 4: 2601113.0

Loss after epoch 5: 2271333.0

Loss after epoch 6: 2052050.0

Loss after epoch 7: 2011768.0

Loss after epoch 8: 1927454.0

Loss after epoch 9: 1887798.0

Cincinnati answered 18/12, 2019 at 20:52 Comment(0)

Recommended topics

Hot tags