What is the difference between Keras model.evaluate() and model.predict()?
Asked Answered
P

4

64

I used Keras biomedical image segmentation to segment brain neurons. I used model.evaluate() it gave me Dice coefficient: 0.916. However, when I used model.predict(), then loop through the predicted images by calculating the Dice coefficient, the Dice coefficient is 0.82. Why are these two values different?

Puerile answered 10/6, 2017 at 18:28 Comment(0)
B
26

The problem lies in the fact that every metric in Keras is evaluated in a following manner:

  1. For each batch a metric value is evaluated.
  2. A current value of loss (after k batches is equal to a mean value of your metric across computed k batches).
  3. The final result is obtained as a mean of all losses computed for all batches.

Most of the most popular metrics (like mse, categorical_crossentropy, mae) etc. - as a mean of loss value of each example - have a property that such evaluation ends up with a proper result. But in case of Dice Coefficient - a mean of its value across all of the batches is not equal to actual value computed on a whole dataset and as model.evaluate() uses such way of computations - this is the direct cause of your problem.

Betimes answered 11/6, 2017 at 20:41 Comment(0)
C
70

The model.evaluate function predicts the output for the given input and then computes the metrics function specified in the model.compile and based on y_true and y_pred and returns the computed metric value as the output.

The model.predict just returns back the y_pred

So if you use model.predict and then compute the metrics yourself, the computed metric value should turn out to be the same as model.evaluate

For example, one would use model.predict instead of model.evaluate in evaluating an RNN/ LSTM based models where the output needs to be fed as input in next time step

Commissure answered 7/3, 2019 at 17:47 Comment(0)
B
26

The problem lies in the fact that every metric in Keras is evaluated in a following manner:

  1. For each batch a metric value is evaluated.
  2. A current value of loss (after k batches is equal to a mean value of your metric across computed k batches).
  3. The final result is obtained as a mean of all losses computed for all batches.

Most of the most popular metrics (like mse, categorical_crossentropy, mae) etc. - as a mean of loss value of each example - have a property that such evaluation ends up with a proper result. But in case of Dice Coefficient - a mean of its value across all of the batches is not equal to actual value computed on a whole dataset and as model.evaluate() uses such way of computations - this is the direct cause of your problem.

Betimes answered 11/6, 2017 at 20:41 Comment(0)
G
22

The keras.evaluate() function will give you the loss value for every batch. The keras.predict() function will give you the actual predictions for all samples in a batch, for all batches. So even if you use the same data, the differences will be there because the value of a loss function will be almost always different than the predicted values. These are two different things.

Gait answered 10/6, 2017 at 19:57 Comment(2)
To be precise, keras.evaluate is not a reliable way of measuring how accurately classifier is going to work in actual world. People should have their own code to calculate this.Cris
But then again model.evaluate() not only gives the loss, but also the specified accuracy metrics as defined in model.compile, like @Commissure pointed out. model.metrics_name shows what evaluate() outputsInfold
E
13

It is about regularization. model.predict() returns the final output of the model, i.e. answer. While model.evaluate() returns the loss. The loss is used to train the model (via backpropagation) and it is not the answer.

This video of ML Tokyo should help to understand the difference between model.evaluate() and model.predict().

Einsteinium answered 30/4, 2020 at 21:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.