tf.nn.dropout
does not impose any norm constraint. I believe what you're looking for is to "process the gradients before applying them" using tf.clip_by_norm
.
For example, instead of simply:
# Create an optimizer + implicitly call compute_gradients() and apply_gradients()
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
You could:
# Create an optimizer.
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
# Compute the gradients for a list of variables.
grads_and_vars = optimizer.compute_gradients(loss, [weights1, weights2, ...])
# grads_and_vars is a list of tuples (gradient, variable).
# Do whatever you need to the 'gradient' part, for example cap them, etc.
capped_grads_and_vars = [(tf.clip_by_norm(gv[0], clip_norm=123.0, axes=0), gv[1])
for gv in grads_and_vars]
# Ask the optimizer to apply the capped gradients
optimizer = optimizer.apply_gradients(capped_grads_and_vars)
I hope this helps. Final notes about tf.clip_by_norm
's axes
parameter:
- If you're calculating
tf.nn.xw_plus_b(x, weights, biases)
, or equivalently matmul(x, weights) + biases
, when the dimensions of x
and weights
are (batch, in_units)
and (in_units, out_units)
respectively, then you probably want to set axes == [0]
(because in this usage each column details all incoming weights to a specific unit).
- Pay attention to the shape/dimensions of your variables above and whether/how exactly you want to
clip_by_norm
each of them! E.g. if some of [weights1, weights2, ...]
are matrices and some aren't, and you call clip_by_norm()
on the grads_and_vars
with the same axes
value like in the List Comprehension above, this doesn't mean the same thing for all the variables! In fact, if you're lucky, this will result in a weird error like ValueError: Invalid reduction dimension 1 for input with 1 dimensions
, but otherwise it's a very sneaky bug.
tf.clip_by_norm
rather thantf.clip_by_value
? – Policyholder