Training only one output of a network in Keras
Asked Answered
B

2

12

I have a network in Keras with many outputs, however, my training data only provides information for a single output at a time.

At the moment my method for training has been to run a prediction on the input in question, change the value of the particular output that I am training and then doing a single batch update. If I'm right this is the same as setting the loss for all outputs to zero except the one that I'm trying to train.

Is there a better way? I've tried class weights where I set a zero weight for all but the output I'm training but it doesn't give me the results I expect?

I'm using the Theano backend.

Bessette answered 6/11, 2016 at 6:1 Comment(5)
That's an uncommon setting for supervised-learning. Show some example data and explain a bit why you got this setting.Outside
I'm using it for Deep Q-Learning. The input is a state and each output is the score for an action. You pick an action and then update the network based on the result of that action. You only want to however update one output as you don't know the result of the other actions...Bessette
I see. This is differently handled. Look at these sources (i marked the line in the link). You just keep the current values for the other actions!Outside
I would like to implement a similar CNN with multiple outputs (multi-task learning). I will run the network on the input (images), get one of the outputs; then depending on the output, select one of the other outputs to run the network and obtain the final output. In training, I will update only one of the streams at a time. This is a very common problem, I think, but strangely, there is no example or documentation to describe a solution. @simeon: did you manage to solve your problem? If so, how? Thx.Ruiz
I actually did the other day and had forgotten about this post. I will put a more detailed response tonight, however, in Keras you can make multiple models with the same layers where the values are shared (off the top of my head you need to use the alternative to 'Sequence'). I basically made a model for each output which shared the layers. It worked well.Bessette
E
19

Outputting multiple results and optimizing only one of them

Let's say you want to return output from multiple layers, maybe from some intermediate layers, but you need to optimize only one target output. Here's how you can do it:

Let's start with this model:

inputs = Input(shape=(784,))
x = Dense(64, activation='relu')(inputs)

# you want to extract these values
useful_info = Dense(32, activation='relu', name='useful_info')(x)

# final output. used for loss calculation and optimization
result = Dense(1, activation='softmax', name='result')(useful_info)

Compile with multiple outputs, set loss as None for extra outputs:

Give None for outputs that you don't want to use for loss calculation and optimization

model = Model(inputs=inputs, outputs=[result, useful_info])
model.compile(optimizer='rmsprop',
              loss=['categorical_crossentropy', None],
              metrics=['accuracy'])

Provide only target outputs when training. Skipping extra outputs:

model.fit(my_inputs, {'result': train_labels}, epochs=.., batch_size=...)

# this also works:
#model.fit(my_inputs, [train_labels], epochs=.., batch_size=...)

One predict to get them all

Having one model you can run predict only once to get all outputs you need:

predicted_labels, useful_info = model.predict(new_x)
Eddington answered 16/5, 2019 at 10:53 Comment(3)
somehow this is not working in v2.3.0 as I am getting the error: ValueError: The two structures don't have the same sequence length. Input structure has length 1, while shallow structure has length 3.Aliquant
I get the following error when attempting to apply this to my network: "ValueError: Variable <tf.Variable 'name1/kernel:0' shape=(61440, 1200) dtype=float32> has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval." My tensorflow==1.14.0. My losses are [None,''categorical_crossentropy"]Soar
@Aliquant yes, me too! Could you find a solution by any chance?Turbulent
B
3

In order to achieve this I ended up using the 'Functional API'. You basically create multiple models, using the same layers input and hidden layers but different output layers.

For example:

https://keras.io/getting-started/functional-api-guide/

from keras.layers import Input, Dense
from keras.models import Model

# This returns a tensor
inputs = Input(shape=(784,))

# a layer instance is callable on a tensor, and returns a tensor
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions_A = Dense(1, activation='softmax')(x)
predictions_B = Dense(1, activation='softmax')(x)

# This creates a model that includes
# the Input layer and three Dense layers
modelA = Model(inputs=inputs, outputs=predictions_A)
modelA.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
modelB = Model(inputs=inputs, outputs=predictions_B)
modelB.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
Bessette answered 21/8, 2017 at 2:35 Comment(2)
The problem here is - you have to run prediction twice to get both outputs.Eddington
@Eddington He can just create a third predictions = Concatenate()([predictions_A, predictions_B]) and set that to the output of a third model.Chandos

© 2022 - 2024 — McMap. All rights reserved.