How does scipy.minimize handle NaN's?

Asked 24/3, 2018 at 4:42 Answered 17/8, 2020 at 14:4

I'm using the SLSQP solver in scipy.minimize to solve a constrained optimization problem. Very often the solver will try parameter values that violate the constraints. When these constraints are violated, the objective function returns a nan. This would seem to pose problems, as my approximated Jacobian is full of nan's nearly every time it is recalculated. More often than not, the optimization terminates in exit mode 8: Positive directional derivative for linesearch. I suspect the nan's in the approximated Jacobian to be the scource of this. My question then is how does scipy.minimize handle nan's? Are they benign, or should they be converted to a large (or even infinite) number? To the best of my knowledge, this information is not covered anywhere in the Scipy documentation.

Eraeradiate answered 24/3, 2018 at 4:42 Comment(4)

You should work on the assumption that every nan is killing everything. So: no nans! The only valid nan is at API-level for bounds. In terms of SLSQP, there is also a difference in bound-handling (imho). Bounds will be strictly kept during iterates (no non-feasible solution in regards to bounds is an iterate) while this is not true for constraints (which can describe bounds too). – Consuetudinary 24/3, 2018 at 11:37

Hmm, so what is the prescription for nans then? Should they be converted to infs? Or some other large finite number? – Eraeradiate 24/3, 2018 at 17:25

It depends very much on the specific minimizer and/or problem. For minimizers which estimate the Hessian and use second derivatives returning high values like this is likely to cause the minimizer to get confused. However for methods which don't use a derivative (like the Nelder-Mead simplex algorithm), returning large values when parameters go out of bounds can sometimes work. – Kingsly 15/12, 2018 at 0:34

What did you do finally ? – Syrupy 13/11, 2021 at 18:0

There are checks in scipy for nans depending on which search algorithm you use. You'll have to check the source of each search algorithm. It generally doesn't affect minimisers (unless you use non-discriminatory methods) but it really messes up maximisation. In general, scipy lands up using numpy arrays. The best way to understand what happens is with the following simple example:

>>> x = [-np.nan, np.nan, 1, 2, 3, np.nan] # some random sequence of numbers and nans 
>>> np.sort(x)
array([ 1.,  2.,  3., nan, nan, nan])

The np.nan is always seen as the largest number thus, you have to account for this explicitly in your search algorithm such that these solutions are rejected from future iterations. As to interpreting +/- nans see this if the backend implementations are in fortran - which is sometimes the case.

Schubert answered 17/8, 2020 at 14:4 Comment(0)

There is a very advanced minimization routine called Minuit which is used in the particle physics community that is similar to the routine you mention. They both use quasi Newton methods to estimate the second derivative in order to try to "jump" to the minimum in the fewest number of iterations.

These methods typically do not handle boundary value problems, and there are a whole different class of algorithms devoted to minimizing functions with constraints.

That being said, it is possible in Minuit to set parameter boundaries. The way this is achieved in Minuit is pretty clever. Essentially each parameter is mapped "internally" to:

p_int = arcsin(2*(p_ext-a)/(b-a)-1)

and

p_ext = a + ((b-a)/2)*(sin(p_int)+1)

where a and b are the upper and lower limits respectively. See the Minuit manual here for more details.

I suspect you could do something similar assuming you have linear bounds on each of your parameters.

Kingsly answered 15/12, 2018 at 0:42 Comment(0)

Recommended topics

Hot tags