Problem: a very long RNN net
N1 -- N2 -- ... --- N100
For a Optimizer like AdamOptimizer
, the compute_gradient()
will give gradients to all training variables.
However, it might explode during some step.
A method like in how-to-effectively-apply-gradient-clipping-in-tensor-flow can clip large final gradient.
But how to clip those intermediate ones?
One way might be manually do the backprop from "N100 --> N99", clip the gradients, then "N99 --> N98" and so on, but that's just too complicated.
So my question is: Is there any easier method to clip the intermediate gradients? (of course, strictly speaking, they are not gradients anymore in the mathematical sense)