When creating a nan which contains a char-sequence, how do I get the char-sequence back? [duplicate]
Asked Answered
E

3

7

When doing

const double d = std::nan ("Hello");

you get a NAN containing the string "Hello". How can one back out this string from the variable d? Is there simply no standard conforming way? This feature seems to make little sense without being able to get the string back.

Enlargement answered 16/8, 2018 at 7:41 Comment(19)
No, have a read of en.cppreference.com/w/cpp/numeric/math/nanSunwise
In my cursory understanding, this exists so you can generate specific NaN values that the implementation provides. Which strings you have to input to get which value is implementation-defined. Notably, you do not get a NaN "containing" the string "asdalsgkhalskdaha" by just calling that function - that would be impossible from an information theory perspective alone (8 double bytes can't store an arbitrary length string).Euphroe
@RichardCritten I've read that page before posting the question, but couldn't (and still cannot) see how this implies an answer to the question.Enlargement
'...you get a NAN containing the string "Hello"', no, you will not, as there can be no such thing. You can use std::nan to get different NaNs if the implementation allows, but there is no such thing as a "NaN containing a string". In fact, it is entirely likely you will get the exact same NaN value for almost any string you input.Acidulate
@Acidulate Maybe, fair enough. I still think "Hello" should fit in there in "most cases".Enlargement
Because of: "The call std::nan("string"), where string is neither an n-char-sequence nor an empty string, is equivalent to the call std::strtod("NAN", (char**)nullptr);." The string you pass if it not one of the implementation defined NAN-string never gets pass through to the creation of the NAN.Sunwise
@TobyBrull: the string you specify is not stored in the NaN (NaNs don't contain strings). The string is used to specify the kind of NaN you want to get. Even if you could store "Hello" in a double, it would result in the value 2.36440259523e-312 (assuming little endian) and not in a NaN.Nymphet
@RudyVelthuis You could store "Hello" in the last five bytes of the double. The first three bytes could still be 0x7FF000. As I understand, this would still be a valid NAN, at least in IEEE. But you're right: I misunderstood the mechanism; the string is only used as some sort of label with implementation-defined meaning.Enlargement
@TobyBrull: yes, that is what I did. You would get a double with the value I gave, and not a NaN. I think it is clear, by now, that the string is not stored in the double (and it would not make sense either).Nymphet
@RudyVelthuis Works for me! I guess, you have to be careful with endianness: coliru.stacked-crooked.com/a/38249f3da11a1439 . Surely it makes sense to store a string in that space.Enlargement
@TobyBrull: are you kidding? It doesn't make sense to store a string in a double. Why on earth would you want to do that?Nymphet
@RudyVelthuis Oh, I thought it was clear that the purpose is for a space efficient implementation of what might otherwise be implemented as std::variant<double, ...>. So, a NAN could also communicate why it's nan: because of an error in a computation, because it represents missing data, because it represents data that has yet to be computed, or whatever other necessity arises. Storing an integer in that space would also work, of course.Enlargement
No, a Nan does not communicate why it is NaN, except by the type of NaN. There is no text in a NaN. even if you manage to put thext there, no one will expect it there. And there is not a lot of space either. So no, it does not make sense to put (ASCII) text in a double. Doubles are not meant to contain text.Nymphet
@Zinki: It is not true there is no such thing as a NaN containing a string. The format commonly used for double has 51 bits available for a NaN payload, which is enough to encode the OP’s string “Hello”.Poul
@RichardCritten: Re “The string you pass if it not one of the implementation defined NAN-string never gets pass through to the creation of the NAN”: OP’s string “Hello” is an n-char-sequence as defined by the C standard (and inherited by C++).Poul
@MichaelVeksler: That question focuses on putting data into a NaN. This question asks how to get it out, so it is not a duplicate.Poul
@RudyVelthuis: That question focuses on putting data into a NaN. This question asks how to get it out, so it is not a duplicate.Poul
@RudyVelthuis: 2.36440259523e-312 is the value you get if you make the high-order bits zero. OP said to make them 0x7ff000, which encodes a NaN. On a little-endian machine, I created a union initialized with the unsigned characters 'H', 'e', 'l', 'l', 'o', 0x00, 0x00, 0x00, and its double member prints as a NaN, but printing the first five bytes yields “Hello”. If you got 2.36e-312, you did something different.Poul
@RudyVelthuis: Yes, a NaN can communicate why it is a NaN. The IEEE-754 committee has considered various purposes for which people might use the payload of a NaN, including conveying information about its origin.Poul
P
6

The C++ standard says an implementation may display the data encoded in a NaN when it is formatting it for fprintf, its relatives such as printf, and by C++ features that inherit from fprintf, such as output stream formatters. This is the only explicit provision in the C++ standard for getting information about the data in a NaN. (I am including statements in the C standard, which the C++ standard incorporates by reference.) About this, the standard says that an implementation may include the encoded data when it is formatting a NaN, but it is in an implementation-defined way, and an implementation may omit this.

You can, of course, examine the data encoded in a NaN by examining the bytes that represent it. However, how the characters passed to the nan function are processed is implementation-defined. An implementation may choose to do nothing with them, it may include them literally in the bytes of the NaN (if they fit), or it may encode or interpret them, such as expecting a hexadecimal numeral in the string, which will be encoded into the bits of the NaN. The IEEE-754 basic 64-bit binary floating-point format commonly used for double has 51 bits available for the payload of a quiet NaN, which is enough to fix six eight-bit characters, so the string “Hello” could be encoded in a NaN.

Here is a breakdown of what the standard says about the nan function:

  • C++ inherits the nan function from C and leaves it to C to specify what it does.
  • C says that nan("n-char-sequence") is equivalent to strtod("NAN(n-char-sequence)", (char**)NULL).
  • C says an n-char-sequence is a sequence of digit and nondigit characters. The digit characters are 0-9, and the nondigit characters are _, A-Z, and a-z. So the string "Hello" is an n-char-sequence.
  • C says, about strtod with “NAN” with an *n-char-sequence, the meaning of the n-char-sequence is implementation-defined.

So, an implementation may encode the bytes you give it in the nan argument.

What the C standard (and C++ by inheritance) says about formatting a NaN is:

  • “A double argument representing a NaN is converted in one of the styles [-]nan or [-]nan(n-char-sequence) — which style, and the meaning of any n-char-sequence, is implementation-defined.”
Poul answered 16/8, 2018 at 17:38 Comment(0)
E
1

This is totally misunderstanding what std::nan does. See documentation.

The call std::nan("n-char-sequence"), where n-char-sequence is a sequence of digits, Latin letters, and underscores, is equivalent to the call std::strtod("NAN(n-char-sequence)", (char**)nullptr);.

Basically "Not a Number" can be represented by couple different quiet NAN - representation of invalid floating point. std::nan provides a means to generate this quiet NAN. It is needed since regular C++ literals do not provide such ability.
So it doesn't "wrap" any kind of string, provided string should represent a number otherwise you will receive regular representation of NAN. Use of "hello" is pointless and string value can't be regained/recreated from your d.

If you want report errors with more details you should use C++ exceptions. For example std::invalid_argument.
I do not have example where custom quite NAN can be useful, maybe someone else can.

Elver answered 16/8, 2018 at 8:10 Comment(1)
This quotes the wrong part of the documentation. OP’s string, “Hello”, is an n-char-sequence: a sequence of digits, Latin letters, and underscores. The quoted statement about when the string is not an n-char-sequence is irrelevant. Per the C++ and C standards, the string is processed in an implementation-defined way. The typical double format has 51 bits available for inclusion in a quiet NaN, so you can encode six eight-bit characters in it.Poul
D
1

To understand what the string parameter to the std::nan function is really intended to be used for, let's consider a different example:

#include <cmath>
#include <iostream>

int main()
{
    const double d = std::nan("123");
    uint64_t u;
    assert(sizeof(u) == sizeof(d));
    memcpy(&u, &d, sizeof(d));
    std::cout << "d = " << d << ", hex = " << std::hex << u << std::endl;

    unsigned int sign = u >> 63;
    unsigned int exponent = (u >> 52) & 0x7ff;
    uint64_t significand = u & 0xfffffffffffffLL;
    std::cout << "sign: " << sign;
    std::cout << ", exp: " << exponent;
    std::cout << ", signif: " << significand << std::endl;
}

The first line printed by this program is

d = nan, hex = 7ff800000000007b

showing that the number in d is indeed a NaN, with a hexadecimal representation of 7ff800000000007b.

The rest of the program extracts the sign, exponent, and significant portions of the number, assuming IEEE-754 double-precision format, with 11 bits for the sign and 52 bits for the fractional significand. The second line printed is

sign: 0, exp: 7ff, signif: 800000000007b

indicating that those components are 0, 0x7ff, and 0x800000000007b, respectively.

Since the exponent is all 1's, this is a special number.

Since the significand is nonzero, this is a NaN. (If the significand were zero, this would be an infinity.)

And then, since there are 252-1 different ways for the significand to be nonzero, we can interpret the significand value as the "payload" of the NaN. So, with that thought in mind, what could the value 0x800000000007b mean?

Let's ignore the leading 8 for the moment. The remaining part is hexadecimal 7b, or in decimal... 123.

That's right, when you say std::nan("123"), you're setting the payload of the NaN. (In most implementations, I believe you can also do so directly in hexadecimal: std::nan("0x7b"). Or probably std::nan("0173"), for that matter.)

And then the leading 8 (which is the high-order bit of the 52-bit field) says that this is a "quiet" as opposed to a "signaling" NaN. Quiet NaNs are usually what you want, so evidently std::nan() sets this for you automatically, whether you ask for it or not. (I'm not sure whether it's possible to use std::nan() to construct a signaling NaN.)

See What is the difference between quiet NaN and signaling NaN? for more on that point.

Calling std::nan("Hello"), on the other hand, doesn't mean much — the payload is always intended to be numeric. When I tried it, std::nan("Hello") was, not too surprisingly, essentially equivalent to std::nan("0").

(Since there are 51 or 52 payload bits available, as pointed out in a comment, you could theoretically jam several actual characters in there. One possible way would be std::nan("0x48656c6c6f"), although besides going "around the barn", it's kinda big-endian.)

These "NaN payloads" are, at least in my experience, a relatively unknown and little-used aspect of the IEEE 754 standard. What were they intended to be used for? One example is that, since there are quite a few bits available, it is possible that, upon encountering certain kinds of exceptional floating-point conditions, a CPU could actually insert the current Program Counter value as the payload of the resulting NaN, to aid in later debugging.

Anyway, in answer to this question's title, the only way I know of to get a NAN's embedded "payload" back is to use IEEE-754-specific bit manipulations, as I've demonstrated here. AFAIK, there are no Standard or portable or machine-independent facilities for doing so.

Dunkirk answered 5/5 at 15:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.