To recap the problems:
NaN
This always returns False
for every comparison, so it stays where it is in the list:
>>> sorted([float('nan'), 0])
[nan, 0]
>>> sorted([0, float('nan')])
[0, nan]
-0.0
This is == to 0.0
, but has a different repr, a different json representation, and slightly different numerical properties. It has the same problem that the positive and negative zeros will stay in the same order they were in the original list:
>>> sorted([0.0, -0.0])
[0.0, -0.0]
>>> sorted([-0.0, 0.0])
[-0.0, 0.0]
Other solution?
@khachik's solution has inconsistent sorting behavior for NaN
and -inf
>>> key=lambda x: float('-inf') if math.isnan(x) else x
>>> sorted([float('nan'), float('-inf')], key=key)
[nan, -inf]
>>> sorted([float('-inf'), float('nan')], key=key)
[-inf, nan]
Solution: more complex key function.
So there are problems with signs and nans. We can just include those in a key function:
def stable_float_sort_key(x: float):
return math.copysign(1, x), math.isnan(x), x
That works on all the examples above:
>>> sorted([float('nan'), 0.0], key=stable_float_sort_key)
[0.0, nan]
>>> sorted([0.0, float('nan')], key=stable_float_sort_key)
[0.0, nan]
>>> sorted([float('nan'), float('-inf')], key=stable_float_sort_key)
[-inf, nan]
>>> sorted([float('-inf'), float('nan')], key=stable_float_sort_key)
[-inf, nan]
>>> sorted([0.0, -0.0], key=stable_float_sort_key)
[-0.0, 0.0]
>>> sorted([-0.0, 0.0], key=stable_float_sort_key)
[-0.0, 0.0]
Indeed, you can write a hypothesis test showing it's consistent across all floating point numbers:
import json
from hypothesis import given, settings
from hypothesis import strategies as st
@given(nums=st.lists(st.floats()), random=st.randoms())
@settings(max_examples=10000)
def test_stable_json_sorting(nums, random):
shuffled = list(nums)
random.shuffle(shuffled)
l1 = sorted(nums, key=stable_float_sort_key)
l2 = sorted(shuffled, key=stable_float_sort_key)
assert json.dumps(l1) == json.dumps(l2)
It does have some oddities however, since some NaN's are negative! For example:
>>> sorted([float('nan'), -0.0, 0.0, float('-nan')], key=stable_float_sort_key)
[-0.0, nan, 0.0, nan]
If that bothers you, you can fix this by switching the ordering:
def stable_float_sort_key(x: float):
return math.isnan(x), math.copysign(1, x), x
That sorts negative numbers first, followed by positive numbers, followed by NaNs.
Does any of this make sense?
Of course other answerers are correct that in some sense none of this makes sense. Comparison of NaNs is a conceptual error of some kind. However, even in cases where the question doesn't "make sense", you may want to have invariants like serializing sets of floating point numbers generated by the same code to exactly the same JSON representation, despite hash randomization (my use case). That's more of a formal property of python code, rather than something where there's a "right answer" per an IEEE standard.
nan
at least). Is it invalid in some other standard? – Nataline.sort()
, I only got to this Q&A once I had already figured it out :\ Thanks for recording it though! – Tempest