Why do NaN values make min and max sensitive to order? [duplicate]

S

2

19

> import numpy as np

> min(50, np.NaN)
50   
> min(np.NaN, 50)
nan

(Same behaviour occurs with max)

I know that I can avoid this behaviour by using numpy.nanmin. But what causes the change when the order is reversed? Is min sensitive to input order?

Shawntashawwal answered 29/6, 2020 at 11:11 Comment(3)

And, if you didn't know there was a NaN in your data set then min(my raw list) and min(my sorted list) could be different, I suppose? – Cherri 29/6, 2020 at 11:14

The reason behind the behaviour is likely because it assumes that (x < y) if and only if ~(x >= y). This doesn't work with NaN. – Kolnos 29/6, 2020 at 11:16

Note that np.min is more consistent: np.min([2., np.nan]) , np.min([np.nan, 2.]) both return nan – Polymerization 29/6, 2020 at 11:25

M

15

Is min sensitive to input order?

Yes.

https://docs.python.org/3/library/functions.html#min

"If multiple items are minimal, the function returns the first one encountered."

The documentation does not specify exactly how "minimal" is defined in the face of items that don't have a consistent order, but it's likely that min is based on looping over the elements and using the < operator to determine if the new element is smaller than the smallest item found so-far.

To confirm this hypothesis we can read the source code (search for builtin_min and min_max in https://github.com/python/cpython/blob/c96d00e88ead8f99bb6aa1357928ac4545d9287c/Python/bltinmodule.c ), it's slightly confusing because the implementations for min and max are combined and the variable names seem to be based on it being a max function but it's not too hard to follow.

And it does indeed loop through the elements in order and performs the comparison with a call to PyObject_RichCompareBool with an "opid" of Py_LT which is the C API equivalent of the python < operator.

Comparisons between NaN and numbers return false, so in a list containing numbers and NaNs if there is a NaN in the first position it will be considered the minimum as no number will be "less than" it. On the other hand, if the NaN is not in the first position then it will be effectively skipped over as it is not "less than" any number.

Mentally answered 29/6, 2020 at 21:1 Comment(0)

C

16

Yes nan breaks proper ordering, because it always compares as False. A lot of things with nan are inconsistent:

In [2]: 3.0 < float('nan')
Out[2]: False

In [3]: float('nan') < 3.0
Out[3]: False

In [4]: float('nan') == 3.0
Out[4]: False

min and max can only give you consistent results of you are working with well-defined orderings, which numeric types are not if you can have nan

Capability answered 29/6, 2020 at 11:16 Comment(3)

This begs the question of exactly how min and max are defined in terms of comparison. We can infer that it's defined like C++ std::min, though, like a < b ? a : b;. See also What is the instruction that gives branchless FP min and max on x86? for more detail on exactly what happens with NaN for min, as opposed to C functions like fmin that reliably do NaN propagation, always giving NaN if either input was NaN. – Triphibious 29/6, 2020 at 22:51

@PeterCordes it is probably implemented in a "naive" fashion, but note, min works on arbitrary iterables of arbitrary objects, and likely, no provision is made for special-casing float objects, or even numeric object in particular. – Capability 29/6, 2020 at 22:56

It's implemented in the obvious manner, a loop with the C API equivilent of the python < operator. Dealing with particular types is left up to the implementation of said operator. – Mentally 30/6, 2020 at 12:54

M

15