std::type_info::hash_code() uniqueness and the meaning of "should"
Asked Answered
T

1

11

Is it meant to be guaranteed that same std::type_info::hash_code() values imply same types?

Cplusplus.com seems to claim so:

This function returns the same value for any two type_info objects that compare equal, and different values for distinct types that do not. [Emphasis mine]

Cppreference seems to claim otherwise:

Returns an unspecified value, which is identical for objects, referring to the same type. No other guarantees are given, in particular, the value can change between invocations of the same program. [Emphasis mine]

The relevant standards paragraphs are:

§ p18.7.1 p7-8

size_t hash_code() const noexcept;

7 Returns: An unspecified value, except that within a single execution of the program, it shall return the same value for any two type_info objects which compare equal.

8 Remark: an implementation should return different values for two type_info objects which do not compare equal. [Emphasis mine]

What's the meaning of "should" supposed to be in the context above? If paragraph 8 is meant to be a requirement, then it seems impossible to fulfill unless the runtime does some kind of global uniquing over all symbol names in a program to ensure lack of hash collision, which seems to be a pretty big burden for the standard to foist upon implementations, especially for a function called hash_code(). (Itanium actually requires this, but it's explicitly an extra requirement above the standard.)

If "should" is not meant to be binding, then the remark seems to be a pointless one and a defect in the standard, since asking implementations to try to fulfill a difficult requirement that cannot be relied upon anyway provides no value and only invites confusion and fragmentation. Anyone know why it's there?

EDIT: Maybe "defect" was too strong a word, but at least it's a point of possible confusion that should be clarified, since it's apparently misled at least one reference site and transitively misled anyone relying upon it. Furthermore, it actually is possible to fulfill the requirement (as long as the number of types supported by the implementation is smaller than the range of size_t) if global uniquing is done at runtime, and it's unclear if the standard is trying to suggest this as the ideal implementation strategy or not.

Tetrahedron answered 27/4, 2013 at 14:25 Comment(0)
D
5

The meaning looks pretty clear to me: it's not an absolute requirement because it may be impossible to meet under some circumstances, but the implementation should attempt to produce unique values to the extent possible.

I'd note that the same is true of hash codes in general -- you try to produce values that are unique, but it's not always possible.

The standard contains a lot of information that's not enforceable. Quite a bit (but certainly not all) is in the form of explicit Notes, but that doesn't mean everything non-normative outside a Note is a defect.

Edit: in case anybody wants to know what the ISO says about how standards should be written, they have a page of guidelines.

Diatessaron answered 27/4, 2013 at 14:33 Comment(12)
OK, I think you're right, but it makes things confusing. For one thing, cplusplus.com is in error and anyone relying on it has buggy code.Tetrahedron
(So it should be clarified at least)Tetrahedron
@StephenLin: I agree that cplusplus.com is in error. It's not known for being particularly accurate. cppreference.com seems to be more reliable at least as a general rule.Diatessaron
The type_info::operator== seems only to be required to return true if the two compared types are equal. So what's about two types that are not equal? I'm asking this because you might want to check both, hash and using the == operator.Gi
@DyP: I believe the same basic idea applies: type_infos need to be equal for the same types, and should be different for different types, but it's not absolutely required.Diatessaron
@JerryCoffin the problem, as I'm pointing out in a discussion with MikeSeymour below, is that (at least in my reading) it seems like the standard is proscribing global uniquing of std::type_info<T> objects as the preferred implementation standard. I'm not sure if that's true or not; if they really mean to suggest that, then they ought to, and if not, they ought not to confuse the issue by clarifying basic computer science.Tetrahedron
@StephenLin: I don't see the standard proscribing anything -- just not requiring it. As far as requiring different values: I think what I said is accurate: different types should give different type_info values (and different hashes) but that's not absolutely required. An implementation could return exactly the same type_info for every type, and 1 for every hash-code and still conform.Diatessaron
@JerryCoffin I guess the point is if they're trying to clarify basic CS theory about what a hash value is, they shouldn't need to bother, and if they're trying to make a statement that global uniquing at runtime is desireable, they should say so explicitly. At this point, I don't know what they meant.Tetrahedron
@StephenLin: In that case, I think you need to get away from the computer and take a walk or something. Seriously -- I just don't see a lot of room for misunderstanding about what it means.Diatessaron
@JerryCoffin Well, apparently it was enough to mislead others. And no, it's not that clear because it's actual quite possibe to guarantee uniqueness with some upfront cost. It would actually be useful to know if this upfront cost was paid or not by the implementation so the user could take advantage of it, but right not it looks like the standard is asking for an upfront cost but allowing the user no way to know whether it has been paid (other than reading ABI docs)Tetrahedron
@JerryCoffin The standard seems to imply that a high-quality implementation should have paid the cost of uniquing, which cplusplus.com has taken as a fact; if it were truly impossible to do this then there wouldn't be any issue, but the reality is that it's possible to do this and it would be useful to explicitly know if it's done via some kind of implementation-provided flag or define, but right now there's no clarity about it.Tetrahedron
@StephenLin: Uniqueness is pretty easy to guarantee for a traditional compiler, but for an interpreter or incremental compiler seems much more difficult (if even possible). I'd guess about the best you could hope for would be changing this from unspecified to implementation defined, so it has to be documented -- but I'm not at all sure that would gain much.Diatessaron

© 2022 - 2024 — McMap. All rights reserved.