Training MSE loss larger than theoretical maximum?
Asked Answered
F

2

6

I am training a keras model whose last layer is a single sigmoid unit:

output = Dense(units=1, activation='sigmoid')

I am training this model with some training data in which the expected output is always a number between 0.0 and 1.0. I am compiling the model with mean-squared-error:

model.compile(optimizer='adam', loss='mse')

Since both the expected output and the real output are single floats between 0 and 1, I was expecting a loss between 0 and 1 as well, but when I start the training I get a loss of 3.3932, larger than 1.

Am I missing something?

Edit: I am adding an example to show the problem: https://drive.google.com/file/d/1fBBrgW-HlBYhG-BUARjTXn3SpWqrHHPK/view?usp=sharing (I cannot just paste the code because I need to attach the training data)

After running python stackoverflow.py, the summary of the model will be shown, as well as the training process. I also print the minimum and maximum values of y_true each step to verify that they are within the [0, 1] range. There is no need to wait for the training to finish, you will see that the loss during the first few epochs is much larger than 1.

Flocculant answered 30/8, 2020 at 8:28 Comment(5)
This is indeed strange. Can you share an MCVE? Because I tried with some dummy data and I do get MSE between 0 & 1.Uvulitis
Can there be a bug in the code that "ensures" real and predicted values are between 0 and 1?Valance
Thank you for your comments. I added an MCVE (see Edit). A priori the real values are between 0 and 1 (I print them) and the predicted values come from a sigmoid function, if I understand the code.Flocculant
Sharing pickled data is unsafe as it's arbitrary python code being executed. Or I don't know enough to rule that out. davidhamann.de/2020/04/05/exploiting-python-pickle . Can you share the data in a safer format like .csv?Valance
@Flocculant were you able to find an answer? Did you maybe post a bug report on GitHub? It could be a bug in Keras.Rhizocarpous
V
1

First, we can demystify mse loss - it's a normal callable function in tf.keras:

import tensorflow as tf
import numpy as np

mse = tf.keras.losses.mse
print(mse([1] * 3, [0] * 3))  # tf.Tensor(1, shape=(), dtype=int32)

Next, as the name "mean squared error" implies, it's a mean, meaning size of vectors passed to it do not change the value as long as the mean is the same:

print(mse([1] * 10, [0] * 10)) # tf.Tensor(1, shape=(), dtype=int32)

In order for the mse to exceed 1, average error must exceed 1:

print( mse(np.random.random((100,)), np.random.random((100,))) )  # tf.Tensor(0.14863832582680103, shape=(), dtype=float64)
print( mse( 10 * np.random.random((100,)), np.random.random((100,))) )  # tf.Tensor(30.51209646429651, shape=(), dtype=float64)

Lastly, sigmoid indeed guarantees that output is between 0 and 1:

sigmoid = tf.keras.activations.sigmoid
signal = 10 * np.random.random((100,))

output = sigmoid(signal)
print(f"Raw: {np.mean(signal):.2f}; Sigmoid: {np.mean(output):.2f}" )  # Raw: 5.35; Sigmoid: 0.92

What this implies is that in your code, mean of y_true is NOT between 0 and 1.

You can verify this with np.mean(y_true).

Valance answered 31/8, 2020 at 12:40 Comment(1)
Thank you @ikamen. I have spent one too many hours testing everything that came to mind, including the possibility of y_trueNOT being between 0 and 1. But as you can verify in my MCVE (see Edit), I print the min and max values of y_true and they do lie within the propper range.Flocculant
R
0

I do not have an answer for the question asked. I am getting nans in my MSE loss, with input in range [0,1] and sigmoid at output. So I thought the question is relevant.

Here are a few observations about sigmoid:

import tensorflow as tf
import numpy as np

x=tf.constant([-20, -1.0, 0.0, 1.0, 20], dtype = tf.float32)
x=tf.keras.activations.sigmoid(x)
x.numpy()

# array([2.0611537e-09, 2.6894143e-01, 5.0000000e-01, 7.3105860e-01,
#   1.0000000e+00], dtype=float32)

x=tf.constant([float('nan')]*5, dtype = tf.float32)
x=tf.keras.activations.sigmoid(x)
x.numpy()

# array([nan, nan, nan, nan, nan], dtype=float32)

x=tf.constant([np.inf]*5, dtype = tf.float32)
x=tf.keras.activations.sigmoid(x)
x.numpy()

# array([1., 1., 1., 1., 1.], dtype=float32)

So, it is possible to get nans out of sigmoid. Just in case someone (me, in near future) has this doubt (again).

Rhizocarpous answered 11/5, 2021 at 9:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.