Escaping local minimum with tensorflow

Asked 29/5, 2018 at 1:54 Answered 5/6, 2018 at 16:13

Solved python tensorflow gradient-descent equation-solving nonlinear-optimization

I am solving this system of equations with tensorflow:

f1 = y - x*x = 0
f2 = x - (y - 2)*(y - 2) + 1.1 = 0

If I choose bad starting point (x,y)=(-1.3,2), then I get into local minima optimising f1^2+f2^2 with this code:

f1 = y - x*x
f2 = x - (y - 2)*(y - 2) + 1.1
sq=f1*f1+f2*f2
o = tf.train.AdamOptimizer(1e-1).minimize(sq)
with tf.Session() as sess:
    init = tf.global_variables_initializer()
    sess.run([init])
    for i in range(50):
        sess.run([o])
        r=sess.run([x,y,f1,f2])
        print("x",r)

How can I escape this local minima with built-in tensorflow tools? May be there is any other TF approach I can use to solve this equation starting from this bad point?

Lichenology answered 29/5, 2018 at 1:54 Comment(2)

stochastic gradient descent from my understanding – Gothard 2/6, 2018 at 18:55

@Metahominid, thats wrong, SGD works when we have batches. There are no batches in this problem. – Lichenology 3/6, 2018 at 10:49

At the moment, there is no global optimization method that is built-in tensorflow. There is a window opened on the scipy world via ScipyOptimizerInterface, but it (currently?) only wraps scipy's minimize, which is a local minimizer.

However, you can still treat tensorflow's execution result as any other function, that can be fed to the optimizer of your choice. Say you want to experiment with scipy's basinhopping global optimizer. You could write

import numpy as np
from scipy.optimize import basinhopping
import tensorflow as tf

v = tf.placeholder(dtype=tf.float32, shape=(2,))
x = v[0]
y = v[1]

f1 = y - x*x
f2 = x - (y - 2)*(y - 2) + 1.1
sq = f1 * f1 + f2 * f2
starting_point = np.array([-1.3, 2.0], np.float32)

with tf.Session() as sess:
  o = basinhopping(lambda x: sess.run(sq, {v: x}), x0=starting_point, T=10, niter=1000)
print(o.x)
# [0.76925635 0.63757862]

(I had to tweak basinhopping's temperatures and number of iterations, as the default values would often not let the solution get out of the basin of the local minimum taken as the starting point here).

What you loose by treating tensorflow as a black box to the optimizer is that the later does not have access to the gradients that are automatically computed by tensorflow. In that sense, it is not optimal -- though you still benefit from the GPU acceleration to compute your function.

EDIT

Since you can provide explicitly the gradients to the local minimizer used by basinhopping, you could feed in the result of tensorflow's gradients:

import numpy as np
from scipy.optimize import basinhopping
import tensorflow as tf

v = tf.placeholder(dtype=tf.float32, shape=(2,))
x = v[0]
y = v[1]

f1 = y - x*x
f2 = x - (y - 2)*(y - 2) + 1.1
sq = f1 * f1 + f2 * f2
sq_grad = tf.gradients(sq, v)[0]
init_value = np.array([-1.3, 2.0], np.float32)

with tf.Session() as sess:
  def f(x):
    return sess.run(sq, {v: x})
  def g(x):
    return sess.run(sq_grad, {v: x})
  o = basinhopping(f, x0 = init_value, T=10.0, niter=1000, minimizer_kwargs={'jac': g})
print(o.x)
# [0.79057982 0.62501636]

For some reason, this is much slower than without providing the gradient -- however it could be that gradients are provided, the minimization algorithm is not the same, so the comparison may not make sense.

Malines answered 5/6, 2018 at 16:13 Comment(0)

Tensorflow (TF) does not include built-in global optimization methods. Depending on the initialization, all gradient-based methods (such as Adam) in TF can converge into local minimum for non-convex loss functions. This is generally acceptable (if not desirable) for large neural networks due to over-fitting issues when approaching the global minimum.

For this particular problem what you may want is root-solving routines from scipy:

https://docs.scipy.org/doc/scipy/reference/optimize.html#root-finding

Questionary answered 4/6, 2018 at 7:45 Comment(0)

Recommended topics

Hot tags