Is NaN a valid key value for associative containers?
Asked Answered
A

3

30

Consider the ordered and unordered associative containers in C++ keyed on double.

Is NaN a valid key type?

With ordered containers, I should say "no", because it does not respect the strict weak ordering.

With unordered containers, I have no idea.

Here's what happens in GCC 4.6.2:

#include <map>
#include <unordered_map>

#include <cmath>

#include <iostream>
#include <prettyprint.hpp>

int main()
{
  typedef std::map<double, int> map_type; // replace by "unorderd_map"

  map_type dm;
  double d = std::acos(5); // a good nan

  dm[d] = 2;
  dm[d] = 5;
  dm[d] = 7;

  std::cout << "dm[NaN] = " << dm[d] << ", dm = " << dm << std::endl;
}

For the ordered map, I get:

dm[NaN] = 7, dm = [(nan, 7)]

For the unordered map, I get:

dm[NaN] = 0, dm = [(nan, 0), (nan, 7), (nan, 5), (nan, 2)]

So in the ordered map, all NaNs are treated the same, which is what I expect, although it seemed like NaN would violate the requirements. For the unordered map, however, I can never retrieve an element again, and all NaNs are different. This is also not what I would expect.

Does the standard have to say anything on this matter?

Update: Thanks to the great answers below, note that the std::map will break if you insert anything else into it once a NaN is in it.

(I'd be very grateful for comments on how other languages handle floating point keys in associative containers.)

Absorber answered 11/11, 2011 at 16:14 Comment(3)
The Standard can't really have anything to say about NaNs specifically, because a NaN is an IEEE concept. In other words, its platform-dependant.Granvillegranvillebarker
@JohnDibling: That's a good point, although the standard recognizes that numerical types may have a "nan" value; as in <limits>.Absorber
> Consider the ordered and unordered associative containers in C++ keyed on double. > Is NaN a valid key type? Wrong question. The correct question is: which possible domain of the builtin less-than contains NaN?Delfinadelfine
B
18

They are both forbidden by the standard.

For the (ordered) associative containers, the definition of strict weak order (25.4/4) says:

If we define equiv(a, b) as !comp(a, b) && !comp(b, a), then the requirements are that comp and equiv both be transitive relations ... equiv(a, b) && equiv(b, c) implies equiv(a, c)

This fails for a = 0.0, b = NaN, c = 1.0, comp = std::less<double>()

For the unordered containers, 23.2.5/3 says that the equality predicate Pred "induces an equivalence relation on values of type Key". Equivalence relations are reflexive, and std::equal_to<double>()(NaN,NaN) is false, so equal_to<double>() is not an equivalence relation.

By the way, keying containers on a double is slightly scary in the same way that comparing doubles for equality is always slightly scary. You never know what you're going to get in the least significant bit.

Something I've always considered a little odd is that the standard expresses the requirements in terms of the key type, not in terms of the actual key values added to the container. I believe you could choose to read this as not guaranteeing that map<double, int> has defined behaviour at all if the implementation supports NaNs, regardless of whether you actually add a NaN to an instance or not. In practice, though, an implementation of std::map cannot somehow conjure a NaN out of its back pocket and try to compare it, it only ever compares key values passed to the instance. So it should be OK (if slightly scary) provided you avoid adding NaNs.

I'd be very grateful for comments on how other languages handle floating point keys in associative containers

A few quick experiments in Python (where set and dict are unordered associative containers which hold keys and values by reference) suggest that NaNs are treated as objects that are unequal in value even if they're "the same NaN", but the same nan object can be found again by identity. As far as I've seen, the containers don't seem to be disturbed by containing multiple nans, or a mixture of nans and other values:

>>> thing = set()
>>> nan = float('nan')
>>> nan
nan
>>> thing.add(nan)
>>> thing.add(nan)
>>> thing
set([nan])

>>> thing = dict()
>>> thing[nan] = 1
>>> thing[nan] = 2
>>> thing[nan]
2
>>> nan2 = float('nan')
>>> thing[nan2] = 3
>>> thing
{nan: 2, nan: 3}

>>> thing = set()
>>> thing.add(nan)
>>> thing.add(nan2)
>>> thing
set([nan, nan])

>>> thing = dict()
>>> thing[nan] = 1
>>> thing[nan2] = 2
>>> thing[0] = 3
>>> thing
{nan: 1, nan: 2, 0: 3}
>>> thing.keys()
[nan, nan, 0]
>>> thing.values()
[1, 2, 3]
>>> thing[0]
3
>>> thing[1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 1
Booma answered 11/11, 2011 at 16:38 Comment(9)
The authoritarian tone of the answer evokes a certain "accept" reflex, but I shall wait a bit longer :-)Absorber
One thing I've left out is that if NaN is the only value that you ever add to your map, then less<double>() is a strict weak order on that value alone. It's not hard to write a function that works as a SWO on a domain of size 1, the only requirement is to return false. In fact I think it takes a NaN and two distinct normal values before it fails: with NaN and one value it should just behave as though NaN is equivalent to that value. But such maps probably aren't very interesting...Booma
@Steve: I think your two-element example behaves as if NaN is equal to the other value. (Neither is less than the other => they are eqivalent.) Which means you will never actually see the other element in the map :-)Grooms
@Nemo: agreed, in fact I corrected the comment but I guess we crossed in the mail.Booma
That Python example is just plain weird. I can't find any sensible mental model in which that behaviour makes sense.Absorber
@Kerrek: the mental model is that the equivalence relation used by Python in the hash table is something like x == y or x is y. So two different calls to float('nan') produce different keys, but any object is always the same key when seen again.Booma
"if NaN is the only value that you ever add to your map" Once you have that set/map with a NaN in it, you cannot add any other float value, because this is actually the same value WRT to <. So if you have a set/map with one element, and its key is NaN, you can try to insert as many other values as you want!Delfinadelfine
Actually, if you read [alg.sorting]: "If we define equiv(a, b)"... there is no explanation about where a and b are coming from... there are just there. This is the funny std style, and it's why nobody serious is taking the std too seriously. Then in [associative.reqmts]: "For any two keys k1 and k2 in the same container, calling comp(k1, k2) shall always return the same value." which obviously is insufficient...Delfinadelfine
The half-C++ half-math description used in the std is quite comical: "for all x(sic)" ROTFL! Anyway, the writing style is soooo heavy and barely readable, I suppose nobody is reading this as a "specification".Delfinadelfine
R
5

This is because

std::less<double>(NaN, NaN) == false

Like you said, the weak total ordering (required for std::map<>) is ok, the equality (or equivalence, extra requirement for any hash-based container) isn't ok to satisfy the key requirements for the hash (unordered) map

Irreflexivity   f(x, x) must be false. 
 Antisymmetry   f(x, y) implies !f(y, x) 
 Transitivity   f(x, y) and f(y, z) imply f(x, z). 
 Transitivity of equivalence     

         Equivalence (as defined above) is transitive: if x 
         is equivalent to y and y is equivalent to z, then x is 
         equivalent to z. (This     implies that equivalence does 
         in fact satisfy the mathematical definition of an equivalence 
         relation.)

Seeing that for std::map, equivalence is when !less(a,b) && !less(b,a), I'd say all constraints are met.

Replevin answered 11/11, 2011 at 16:16 Comment(9)
Yes, I appreciate that, but is this behaviour documented or commented on somehow in the standard? Also, I wouldn't say that "the ordering is OK", since NaN does not obey a SWO: it is not one of { less, greater, equal } when compared to anything.Absorber
@KerrekSB see SGI std::map and click through to SGI Strict Weak Ordering pageReplevin
The part of SWO that NaN values fail is the transitivity of equivalence. 0 < NaN and NaN < 0 are both false. 1 < NaN and NaN < 1 are both false. Therefore, 0 < 1 and 1 < 0 should both be false, but they aren't. Hence the map has undefined behavior as soon as you put a NaN in it. Arguably it even has UB earlier, since the standard says that the order must be a SWO on the key type, not just on the key values actually added to the map, but that's quite a fine point.Booma
Hmm... I see: Given that NaN is never less than anything, those conditions are vacuously satisfied. OK, that's fine. So for map, double is a perfectly legitimate key type.Absorber
@Kerrek: no, listen to me, not Billy and sehe! ;-)Booma
@SteveJessop: I am, I am -- well spotted on the transitivity!Absorber
@Steve: Yes, I screwed that one up. Kerrek: Your test will fail if you try to insert anything other than a NaN into the map at this point -- it'll be treated as equivalent to the NaN and therefore won't be inserted.Oligopsony
Remember, a strict weak order is actually a total order.Delfinadelfine
SGI does not define the C++ standard.Blooded
O
3

NaNs can be stored inside a map -- namely, they are copy constructable and less than comparable. std::less for doubles doesn't meet map's requirements for a strict weak ordering, so you've technically got undefined behavior here. However, the behavior is easily explained, even if not required by the standard. Map uses equivalence rather than equality to determine whether an item is a duplicate. Two NaNs compare equivalent, but not equal. However, that falls apart in some cases. For example, if you tried to insert something other than a NaN into that map, it would be treated as equivalent to the NaN and you'd get no insert. Try adding some real numbers in addition to the NaNs and you can see map breaks down too.

The hash behavior is expected but not defined for a hash table as well -- hash tables require their contents to be copy construcable and equality comparable. The hashes of multiple NaNs compare equal, so they're all going to go into the same bucket, but a hash table uses equality comparison rather than less than comparison (equality rather than equivalence). Therefore, none of the NaNs compare equal to each other and you get multiple inserts for that key. This is because NaN breaks the hash table's equality comparable requirement -- namely the invariant that std::equal_to(x, x) true.

Oligopsony answered 11/11, 2011 at 16:21 Comment(16)
Yeah, cheers... I can see all those things; I was just wondering if that's somehow an acknowledged corner of the standard; or if every compiler vendor should be saying, "don't use NaN values in our associative containers".Absorber
@KerrekSB: I think in map's case you've technically got undefined behavior because of the SWO issue mentioned in the other answer. I think the hash table's behavior is actually well defined. I don't think the standard says any of this explicitly; it just comes along for the ride in the requirements of hashable, equality comparable, less than comparable, and strict weak ordering.Oligopsony
@KerrekSB: Put another way, I can't think of any conforming implementation of map and unordered_map having behavior any different than you show in your test, despite the standard not going into that explicitly.Oligopsony
I was thinking that you'd have to provide a different predicate functor which inherits equal_to trivially, but is specialised to numeric types with NaN and compares those equal...Absorber
The problem with NaN in a hash table is that lookups always return "false". So you can insert the element, but you will never find it... Are you sure unordered_map does not require operator==(x,x) to be true?Grooms
@KerrekSB: That should work. Of course, you're kind of changing the semantics of NaN by doing that.Oligopsony
@Nemo: Yep, you're right. Then both types of containers are undefined.Oligopsony
@Grooms and KerrekSB: I've significantly updated this answer to correct it. :) Thanks for the counterexamples!Oligopsony
"Map uses equivalence rather than equality to determine whether an item is a duplicate." Std uses both terms actually. Anyway, it's just a matter of choosing a word, and using it consistently. It may be easier (less confusing) to talk about equality of keys and total ordering, than equivalence and strict weak ordering.Delfinadelfine
@curiousguy: As far as I am aware, the standard does not use both terms. Equality is specific; it means a == b. Equivilence is something else entirely; it's !(a < b || b < a). Can you find a portion of the standard violating this? (In C++03; I don't have a copy of 0x to use :( )Oligopsony
@BillyONeal "As far as I am aware, the standard does not use both terms" Even if the std doesn't say "equal", "equivalent" is not used consistently. In the explanation of set/multiset..., "is an associative container that supports unique keys (contains at most one of each key value)" These words (value, unique) imply equality: two values are the same iff they are equal, not equivalent.Delfinadelfine
@curiousguy: I don't think that makes any sense. As far as set and multiset are concerned, everything is done via equivalence, not equality -- after all, there's no way to make these containers respect operator== or any kind of equality functor. Unique as far as the container is concerned is equivalent, not equal.Oligopsony
@BillyONeal "there's no way to make these containers respect operator== or any kind of equality functor" Obviously. I never mentioned any of those. Equality of keys means mathematical equality, not a C++ function with 2 arguments. "Unique" means that only one key with mathematically equal value is allowed in the container. I see no problem here: equality is the same equivalence, as defined according to comparator. There is no other way to defined "equal keys".Delfinadelfine
@BillyONeal "Equality is specific; it means a == b." Two strings may compare equal with operator==, that doesn't mean they are equal objects, because equality of objects is not well defined (and arguably two objects are only equal if they have the same address). We say that these string objects have "the same value", where "value" means what is compared by operator==.Delfinadelfine
@curiousguy: No, we don't. Equality means std::equal, which means operator== if defined for the type in question. There are no equal objects in C++ because everything is a value type.Oligopsony
@BillyONeal "There are no equal objects in C++ because everything is a value type." I am not sure what you mean.Delfinadelfine

© 2022 - 2024 — McMap. All rights reserved.