What is the best way to implement weight constraints in TensorFlow?
Asked Answered
C

5

36

Suppose we have weights

x = tf.Variable(np.random.random((5,10)))
cost = ...

And we use the GD optimizer:

upds = tf.train.GradientDescentOptimizer(lr).minimize(cost)
session.run(upds)

How can we implement for example non-negativity on weights?

I tried clipping them:

upds = tf.train.GradientDescentOptimizer(lr).minimize(cost)
session.run(upds)
session.run(tf.assign(x, tf.clip_by_value(x, 0, np.infty)))

But this slows down my training by a factor of 50.

Does anybody know a good way to implement such constraints on the weights in TensorFlow?

P.S.: in the equivalent Theano algorithm, I had

T.clip(x, 0, np.infty)

and it ran smoothly.

Camarilla answered 13/11, 2015 at 13:56 Comment(2)
How about using tf.nn.relu(x) whenever you reference x and let the SGD handle the rest?Aloise
I think a related GitHub issue is here.Pixie
G
26

You can take the Lagrangian approach and simply add a penalty for features of the variable you don't want.

e.g. To encourage theta to be non-negative, you could add the following to the optimizer's objective function.

    added_loss = -tf.minimum( tf.reduce_min(theta),0)

If any theta are negative, then add2loss will be positive, otherwise zero. Scaling that to a meaningful value is left as an exercise to the reader. Scaling too little will not exert enough pressure. Too much may make things unstable.

Greenquist answered 25/5, 2016 at 2:23 Comment(1)
The problem is that you provide no smooth gradient to the object function, if you try using a Relu function with your stuff for penalty, at least it provides a function that is C^0.Massasoit
T
24

As of TensorFlow 1.4, there is a new argument to tf.get_variable that allows to pass a constraint function that is applied after the update of the optimizer. Here is an example that enforces a non-negativity constraint:

with tf.variable_scope("MyScope"):
  v1 = tf.get_variable("v1", …, constraint=lambda x: tf.clip_by_value(x, 0, np.infty))

constraint: An optional projection function to be applied to the variable after being updated by an Optimizer (e.g. used to implement norm constraints or value constraints for layer weights). The function must take as input the unprojected Tensor representing the value of the variable and return the Tensor for the projected value (which must have the same shape). Constraints are not safe to use when doing asynchronous distributed training.

Turki answered 8/11, 2017 at 21:10 Comment(1)
Do we need to worry about the gradients at the clipping points (i.e. 0 and inf in your example)?Valaree
B
16

By running

sess.run(tf.assign(x, tf.clip_by_value(x, 0, np.infty)))

you are consistently adding nodes to the graph and making it slower and slower.

Actually you may just define a clip_op when building the graph and run it each time after updating the weights:

# build the graph
x = tf.Variable(np.random.random((5,10)))
loss = ...
train_op = tf.train.GradientDescentOptimizer(lr).minimize(loss)
clip_op = tf.assign(x, tf.clip(x, 0, np.infty))

# train
sess.run(train_op)
sess.run(clip_op)
Bifacial answered 31/3, 2017 at 6:2 Comment(1)
What do you do if the weights that you want to clip is not something that you defined but part of, for example, tf.contrib.layers.fully_connected?Burtonburty
O
3

I recently had this problem as well. I discovered that you can import keras which has nice weight constraint functions as use them directly in the kernen constraint in tensorflow. Here is an example of my code. You can do similar things with kernel regularizer

from keras.constraints import non_neg

conv1 = tf.layers.conv2d(
    inputs=features['x'],
    filters=32,
    kernel_size=[5,5],
    strides = 2,
    padding='valid',
    activation=tf.nn.relu,
    kernel_regularizer=None,
    kernel_constraint=non_neg(),
    use_bias=False)
Obel answered 26/4, 2018 at 22:30 Comment(0)
E
1

There is a practical solution: Your cost function can be written by you, to put high cost onto negative weights. I did this in a matrix factorization model in TensorFlow with python, and it worked well enough. Right? I mean it's obvious. But nobody else mentioned it so here you go. EDIT: I just saw that Mark Borderding also gave another loss and cost-based solution implementation before I did.

And if "the best way" is wanted, as the OP asked, what then? Well "best" might actually be application-specific, in which case you'd need to try a few different ways with your dataset and consider your application requirements.

Here is working code for increasing the cost for unwanted negative solution variables:

cost = tf.reduce_sum(keep_loss) + Lambda * reg # Cost = sum of losses for training set, except missing data.        
if prefer_nonneg: # Optionally increase cost for negative values in rhat, if you want that.
    negs_indices = tf.where(rhat < tf.constant(0.0))
    neg_vals = tf.gather_nd(rhat, negs_indices)
    cost += 2. * tf.reduce_sum(tf.abs(neg_vals))  # 2 is a magic number (empirical parameter)         

You are free to use my code but please give me some credit if you choose to use it. Give a link to this answer on stackoverflow.com please.

This design would be considered a soft constraint, because you can still get negative weights, if you let it, depending on your cost definition.

It seems that constraint= is also available in TF v1.4+ as a parameter to tf.get_variable(), where you can pass a function like tf.clip_by_value. This seems like another soft constraint, not hard constraint, in my opinion, because it depends on your function to work well or not. It also might be slow, as the other answerer tried the same function and reported it was slow to converge, although they didn't use the constraint= parameter to do this. I don't see any reason why one would be any faster than the other since they both use the same clipping approach. So if you use the constraint= parameter then you should expect slow convergence in the context of the original poster's application.

It would be nicer if also TF provided true hard constraints to the API, and let TF figure out how to both implement that as well as make it efficient on the back end. I mean, I have seen this done in linear programming solvers already for a long time. The application declares a constraint, and the back end makes it happen.

Edroi answered 25/4, 2018 at 15:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.