Are IEEE floats valid key types for std::map and std::set?
Asked Answered
V

4

24

Background

The requirement for a comparator on the key type of an associative container (for example std::map) is that it imposes a strict weak order on the elements of the key type.

For a given comparator comp(x, y) we define equiv(x, y) = !comp(x, y) && !comp(y, x).
The requirements for comp(x, y) being a strict weak order are

  1. Irreflexibility (!comp(x, x) for all x)
  2. Transitivity of the ordering (if comp(a, b) and comp(b, c) then comp(a, c)).
  3. Transitivity of equivalence (if equiv(a, b) and equiv(b, c) then equiv(a, c))

std::less<float> (the default comparator) uses operator<, which does not create a strict weak order because of NaN. Because x < NaN and NaN < x are false for all x, NaN is equivalent to all floats under this comparator, this breaks condition #3: equiv(1.0, NaN) and equiv(NaN, 2.0) but not equiv(1.0, 2.0). For IEEE floats except NaN, it is a strict weak order (where each number has its own equivalence class except for 0 and -0).

The question

Does this mean that one is not allowed by the C++ standard to use IEEE floats (and (long) doubles) as a key type in an associative container because of the above issue, even if I make sure that NaN never gets inserted into the container? I'm not quite sure about the "elements of Key" wording in the standard—if it means all possible elements or just the elements that end up in the container.

Note: The question is not about issues wrt. truncation/rounding, I'll likely post a different question regarding that soon.

Update:

Sigh. I should've posed the question without specifying float, I just thought it was a nice example.

The question really is: Is it allowed to use a comparator that only imposes a strict weak order on the elements that get put into the container, not all possible instances of the key type? Please don't just answer "yes" or "no", I'd like some references to the standard / prior discussions about this / an answer from a commitee member or something.

Vacua answered 27/1, 2011 at 12:14 Comment(2)
I commented in my answer what I think would happen if you tried inserting a NaN into one, depending on whether or not the container was empty at the time.Craggie
Related: #8097317Reddy
S
9

I suspect the restrictions should be taken as referring to the relation's behavior on the values actually used as keys, not necessarily on all values of the type. Don't have time at the moment to go through the standard looking for "smoking gun" language that refers to actual container elements rather than all values of the type.

Similar case: what if a comparator (for a container of pointers or smart pointers) calls a virtual function, and somebody links a derived class of the type it compares, which overrides the virtual function in a way that makes the comparator not a strict weak order? Does the program become undefined even if nobody ever actually uses that derived class?

If in doubt, you can support NaN with a comparator that is a strict weak order:

bool operator()(double a, double b) {
    if ((a == a) && (b == b)) {
        return a < b;
    }
    if ((a != a) && (b != b)) return false;
    // We have one NaN and one non-NaN.
    // Let's say NaN is less than everything
    return (a != a)
}

The last two lines "optimize" to return (b == b);, although I'm not sure the comment optimizes with it.

I think Tomalak has convinced me the language does say the whole type needs to be ordered.

This makes little sense, since a map doesn't conjure values out of nowhere, it only uses values that it's given (and copies of them), but the question is about the rules, and them's the rules as far as I know. C++0x is the same. I wonder if there's a defect report, or any point submitting one.

It's also annoying in that on (very rare) systems where std::less is slow for pointers, you can't use < as the comparator in a map of pointers, even if you know that the pointers are all to elements of the same array. Shame.

Another option is to use the following class as the key type, so keys are checked for NaN only on entry to the map, not on every comparison as above.

struct SaneDouble {
    double value;
    SaneDouble (double d) : value(d) {
        if (d != d) throw std::logic_error();
    }
    static friend bool operator<(SaneDouble lhs, SaneDouble rhs) {
        return lhs.value < rhs.value;
    }
    // possibly a conversion to double
};

This raises another question - clearly someone could create a SaneDouble and then set its value to NaN (assuming the implementation lets them get one from somewhere without crashing). So are "elements of SaneDouble" strict-weak-ordered or not? Does my half-hearted attempt to create a class invariant in the constructor make my program undefined even if nobody actually breaks the invariant, simply because they could and therefore the results of doing so are "elements of SaneDouble"? Is it really the intention of the standard that the program's behavior is defined if and only if value is marked private? Does the standard actually define anywhere what "the elements" of a type are?

I wonder whether we should interpret "elements of Key" to mean that the comparator induces a strict weak order on some elements of Key. Presumably including the ones actually used. "I have doughnuts" doesn't mean I have every doughnut. It's a stretch, though.

Spinach answered 27/1, 2011 at 12:41 Comment(4)
The wording in the standard (2003, 23.1.2/2), at least to me, clearly indicates that all elements of the set of values of type Key must satisfy strict weak ordering.. whether this appears to make sense to us or not, I'm pretty sure that this is what it states.Louisiana
25.3/3 says "... comp has to induce a strict weak ordering on the values". Note that it does not say "on the type" or "on all possible values of the type". Technically, I think Steve's tenuous interpretation may be feasible.Nudd
@Oli: That text seems to refer to the values on which the comparator is supposed to be a strict weak order. In the case of the algorithms in 25.3, that's looking like the values of the iterators in the specified range(s). In the case of associative containers, it's explicitly stated in 23.1.2/2 that it's "elements of Key", whatever that means. I can see why they'd want to define it that way - otherwise all the functions on the containers need to say, "the argument must be a member of the set of values which is strict-weak-ordered by the comparator".Spinach
@SteveJessop actually "the argument must be a member of the set of values which is strict-weak-ordered by the comparator" wouldn't be the correct thing to say either, since in general there might be more than one maximal subset of all-possible-values-of-type-Key that is strict-weak-ordered by the comparator. It would have to say something like "the comparator must be a strict weak order on the values in use" or something like that. Still awkward.Stow
N
2

In short: this is fine (in the sense that your question is about).

If you prevent the values (i.e. NaN) that wouldn't satisfy the ordering requirements, then the behaviour is entirely defined.

Nudd answered 27/1, 2011 at 12:17 Comment(3)
In practice, yes. I believe the question was more about the theoretical, language-lawyer point of view. From the practical point of view, a pre-order suffices, too, as long as you never put two elements into the container that are not comparable. I don't believe language lawyers would treat that as legal code.Professional
@Christopher, @etarion: Steve Jessop's answer says what I'm trying to express better than I did. The requirement is almost certainly intended to refer to the ordering defined on the values in use.Nudd
In that case, he and Steve have probably given you the “correct” answer. Who cares what the standard would permit a compiler writer to do if their customers were, let's say, rather unhappy with the choice?Professional
M
1

Putting floats as keys of an associative container is sometimes a bad idea since the equality semantics are quite poor. But it depends on what you want to do. Bear in mind that NaN and infinities are usually not a problem. You can handle them with special comparator functions (I usually don't), and obviously the requirements of the standard are about the actual keys that will end up in the container, that you can see as a subset of the key type. A map implementation will never introduce key instances that you did not feed yourself into the map.

I used once this predicate for a map where I could disallow two very close values:

struct FuzzyComparer
{
    template <typename T>
    bool operator()(T x, T y) const
    {
        static const T oneMinusEps = (T)1. - 64 * std::numeric_limits<T>::epsilon();
        return x / y < oneMinusEps;
    }
};

This does not provide you with a good equivalence relation. This is only useful when you want to store discrete floating point values, and you are ready to tolerate some roundoff error in the computation which yields the key you want to retrieve. For the actual keys that will be inserted, it yields an equivalence relation.

You will not be able to come up with a good equivalence relation on floating point numbers which is compatible with the arithmetic operations, ie. with makes addition and multiplication associative.

You either have to throw away the "equivalence relation" part, which shouldn't be that a big deal in real world code (I doubt the transitivity of the eq. relation is used to an extent that would bother you in a typical map implementation) or throw away the compatibility with arithmetic operations. But what is the point of using floating point values as keys then ?

Mauri answered 27/1, 2011 at 12:19 Comment(19)
Nice comparator. People too often use a difference instead of a rapport, and it doesn't work for "extreme" values...Laguna
Does this provide a strict weak ordering?Nudd
@Oli: This is at least transitive, but the equivalence relation arising from it is not. This is adequate for most purposes. Remember that floating point numbers have poor mathematical properties: addition is not associative for instance.Mauri
"Fuzzy" compares never impose a strict weak order because it makes the equivalence classes ill-defined, so this is even worse.Vacua
@etarion: there are no "mathematically good" equivalence relations on floating point numbers which are useful in real code. Live with it.Mauri
I explicitely stated in the question that it was not about precision issues, but from a language standpoint - does the C++ standard allow this.Vacua
@etarion: the answer is yes then, but this is not compatible with arithmetic operations, so it is probably useless. See my edit.Mauri
"The answer is yes" is not helpful without explaining why - e.g. where does the standard state that, or which other semantics imply this. And yes, the transitivity of the equivalence relation is important (keep in mind that this is the only condition that is violated by NaN, now put a NaN in a set and see what happens).Vacua
See ideone.com/S44eC for a case where the violation of the transivity of equivalence bothers your comparer in a "typical implementation" (gcc). There is actually a case (which i'm looking at right now) is where the compatibility with arithmetic operations doesn't matter measurably, but values disappearing because of a bad equivalence relation does hurt badly. But anyway - this is a language question, not an implementation one :)Vacua
@etarion: Right, but I have assumptions about the floating point numbers that get inserted into the map: they are spaced by more than some bigger number (a few 10^3). Indeed, the associativity thing was considered when the decision to use a map<double> was discussed.Mauri
This does break the OP's 3rd condition if 3 values are x-delta, x and x+delta thus x is equivalent to both but x-delta and x+delta are far enough apart to not compare as equivalent.Craggie
@CashCow: indeed. If you want a proximity tree, there are ways of building one, but std::map with a fuzzy comparator isn't it.Spinach
@CashCow, @Vacua however, equivalence classes may occur well-defined for the given key set (actually contained in the map). If so, the comparator is correct :)Constrictive
This part: 'You either have to throw away the "equivalence relation" part, which shouldn't be that a big deal in real world code (I doubt the transitivity of the eq. relation is used to an extent that would bother you in a typical map implementation)' is not good advice. STL implementations in the real world are known to give wrong results, crash, corrupt memory, and endless-loop when given a comparator that violates the stated requirements.Stow
@DonHatch : operator< on doubles is a correct comparator, if you disallow NaN values. However, the induced equality (!(a < b) && !(b < a)) is useless, but well defined (NaN aside).Mauri
Ah I see what you are saying. I thought you meant it's okay to disregard the contract, and use the native operator< with a dataset that includes NaNs.... now I think that isn't what you're saying, although I'm still not sure I completely follow what you are saying. If you're saying floating point operator == is useless, I think that's an overly strong statement.Stow
@DonHatch: ` operator<` has enough guarantees to put doubles into maps. Just don't expect to be able to retrieve a number exactly -- you have to search for all the entries within some epsilon of your number (using eg. map::lower_bound). For instance, I've seen seemingly working code which uses map::find () with literal doubles break completely when called from a different thread. In short, map<double, T> is fine, just don't use find on it (also you should rather use multimap for the same reason, but my answer is 4 year old and this was something I did not realize back then).Mauri
@DonHatch: please note that I realize that my 4 year old answer says something different that I think now -- I will do something about it later.Mauri
Alexandre, I would like to see your example where map::find() with literal doubles breaks when called from a different thread. I have not seen this, and my current understanding is that that can't happen in a correct implementation (assuming NaNs are not used, of course).Stow
C
1

You can put floats and doubles as the key to a std::map or std::set for the purpose of them being correctly sorted.

The issue is when it comes to uniqueness, as you may well get duplicates occurring because of the way floats and doubles are compared.

Using a comparison based with epsilons (close values considered equal) is also not without danger as you could have genuine non-duplicates eliminated for being too close.

The "lookup" may not find an element that is there if you use simple "find", so you may wish to use some kind of epsilon lookup with upper_bound() on x-delta where x is what you are really looking for, and allowing a value that is less than x+delta.

Given all that, there is clearly no problem of using float or double as a key in std::multiset or std::multimap as long as you use upper_bound to search rather than equal_range.

With regards to NaNs, if the set or map is not empty they would be considered "equal" to any element already there and therefore would not insert. If you insert a NaN first, all subsequent insertions should fail.

They would be considered equal because x<NaN and NaN<x both evaluate false.

Of course by the same logic if it's a map and you call

myMap[ NaN ] = x;

it could justifiably modify any element.

(Same if you do find(NaN), which could return any iterator).

Therefore if you are going to have NaNs in any part of this calculation use a special comparator such as Steve Jessop's.

Craggie answered 27/1, 2011 at 12:43 Comment(1)
This answer misses the topic - the question is "is this allowed from a language standpoint".Vacua

© 2022 - 2024 — McMap. All rights reserved.