How do I get a loss per epoch and not per batch?
Asked Answered
A

2

14

In my understanding an epoch is an arbitrarily often repeated run over the whole dataset, which in turn is processed in parts, so called batches. After each train_on_batch a loss is calculated, the weights are updated and the next batch will get better results. These losses are indicators of the quality and learning state of my to NNs.

In several sources the loss is calculated (and printed) per epoch. Therefore I am not sure if I am doing this right.

At the moment my GAN looks like this:

for epoch:
  for batch:

    fakes = generator.predict_on_batch(batch)

    dlc = discriminator.train_on_batch(batch, ..)
    dlf = discriminator.train_on_batch(fakes, ..)
    dis_loss_total = 0.5 *  np.add(dlc, dlf)

    g_loss = gan.train_on_batch(batch,..)

    # save losses to array to work with later

These losses are for each batch. How do I get them for an epoch? As an aside: Do I need losses for an epoch, what for?

Affirmatory answered 5/1, 2019 at 16:23 Comment(0)
B
17

There is no direct way to compute the loss for an epoch. Actually, the loss of an epoch is usually defined as the average of the loss of batches in that epoch. So you can accumulate the loss values during an epoch and at the end divide it by the number of batches in the epoch:

epoch_loss = []
for epoch in range(n_epochs):
    acc_loss = 0.
    for batch in range(n_batches):
        # do the training 
        loss = model.train_on_batch(...)
        acc_loss += loss
    epoch_loss.append(acc_loss / n_batches)

As for the other question, one usage of epoch loss might be to use it as an indicator to stop the training (however, the validation loss is usually used for that, not the training loss).

Bolding answered 5/1, 2019 at 16:45 Comment(12)
Sounds like a strange definition; wouldn't the loss of an epoch be simply the loss of the last batch of the subject epoch?Sicklebill
@Sicklebill No, but maybe I am mistaken... where do you get that idea from? Actually, the loss you see in the progress bar of Keras is also the average of loss of batches in that epoch.Bolding
Yeah, speaking about the training loss you are right of course... It's just that the whole discussion strikes me as rather strange - after all, the validation loss (which is arguably the quantity to be monitored) is only computed per epoch...Sicklebill
@Sicklebill No, I don't think that's the case either. You can't feed the model with the entire validation data at one go most of the time. So you get the average of loss per batches in that case as well, I think. And this is what happens with evaluate method in Keras as well, as it takes a batch_size argument and at the end gives you the average of batch loss.Bolding
Yes, but the said batches are fed after the end of the training epoch, and not of course while the epoch is in progress... That's why the evaluate method in the validation set will report the same metric values with the ones printed in the progress bar...Sicklebill
@Sicklebill Sure, but how does that make a difference? It is OK that the model's parameters are changing at the end of each batch (but I see your point that it is a bit hard to swallow this at first glance). The training loss is just a sign that the model is progressing or not (i.e. is it learning or not?), and we should not expect anything further than that from it.Bolding
Just thinking out loud, that this way of computing the per-epoch training loss, is not very meaningful (and hence useful) in practice... After all, we are interested in the "present" state of our network, so why exactly care how it fared K batches ago in the beginning of an epoch (where K may be in the hundreds or more)?Sicklebill
@Sicklebill I think it would be a terrible mistake to evaluate the model based on its performance on just one single batch. Some batches may be harder to learn, some batches might be easier to learn. Further the distribution of samples in a batch might not be a representation of the whole dataset (e.g. how could a batch of size 256 or 512 represent the distribution of images in ImageNet?). I think making the definition of "present" limited to a single batch might be something like "overfitting" and it would not be a good indicator of the model performance on the whole data, hence the average.Bolding
I don't disagree; all I'm saying is better run an evaluate in the present state of your model...Sicklebill
@Sicklebill On the whole training/validation data? After each batch?Bolding
After each epoch, if the per-epoch loss is what you are after (we are still in the context of the OP, remember?), as it is actually done for the validation metrics in Keras...Sicklebill
@Sicklebill Aha, I see. I don't know... maybe that would be an extra cost with little benefits... you know, a kind of trade-off is happening here (accuracy vs. resources)... and as I said, I am not aware of any practical use for the training loss, except a general indicator of learning progress, so its exact value would not be of much use. But you are right in that, especially if the OP wants to do sth very precise and accurate with training loss.Bolding
C
4

I'll expand on @today answer a bit. There is a certain balance to strike in how to report loss over an epoch and how to use it to determine when training should stop.

  • If you only look at the loss of the most recent batch, it will be a very noisy estimate of your dataset loss because maybe that batch happened to store all the samples your model has trouble with, or all the samples that are trivial to succeed on.
  • If you look at the averaged loss over all batches in the epoch, you may get a skewed response because, like you indicated, the model has been (hopefully) improving over the epoch, so the performance on the initial batches aren't as meaningfully compared to the performance on the later batches.

The only way to accurately report your epoch loss is to take your model out of training mode, i.e. fix all the model parameters, and run your model on the whole dataset. That will be an unbiased computation of your epoch loss. However, in general that's a terrible idea because if you have a complex model or a lot of training data, you will waste a lot of time doing this.

So, I think it's most common to balance these factors by reporting an averaged loss over N mini-batches, where N is large enough to smooth out the noise of individual batches but not so large that the model performance is not comparable between the first and last batches.

I know you're in Keras but here is a PyTorch example that illustrates this concept clearly, replicated here:

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

You can see they accumulate the loss over N=2000 batches, report the averaged loss over those 2000 batches, then zero out the running loss and keep going.

Christopher answered 4/6, 2020 at 16:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.