What is the result of `strtod("3ex", &end)` supposed to be? What about `sscanf`?
Asked Answered
P

1

11

In my experiments this expression

double d = strtod("3ex", &end);

initializes d with 3.0 and places end pointer at 'e' character in the input string. This is exactly as I would expect it to behave. The 'e' character might look as a beginning of the exponent part, but since the actual exponent value (required by 6.4.4.2) is missing, that 'e' should be treated as a completely independent character.

However, when I do

double d;
char c;
sscanf("3ex", "%lf%c", &d, &c);

I notice that sscanf consumes both '3' and 'e' for the %lf format specifier. Variable d receives 3.0 value. Variable c ends up with 'x' in it. This look strange to me for two reasons.

Firstly, since the language specification refers to strtod when describing the behavior of %f format specifier, I intuitively expected %lf to treat the input the same way strtod does (i.e. choose the same position as the termination point). However, I know that historically scanf was supposed to return no more than one character back to the input stream. That limits the distance of any look-ahead scanf can perform by one character. And the example above requires at least two character look-ahead. So, let's say I accept the fact that %lf consumed both '3' and 'e' from the input stream.

But then we run into the second issue. Now sscanf has to convert that "3e" to type double. "3e" is not a valid representation of a floating-point constant (again, according to 6.4.4.2 the exponent value is not optional). I would expect sscanf to treat this input as erroneous: terminate during %lf conversion, return 0 and leave d and c unchanged. However, the above sscanf completes successfully (returning 2).

This behavior is consistent between GCC and MSVC implementations of standard library.

So, my question is, where exactly in the C language standard document does it allow sscanf to behave as described above, referring to the above two points: consuming more than strtod does and successfully converting such sequences as "3e"?

By looking at my experiment results I can probably "reverse engineer" the sscanf's behavior: consume as much as "looks right" never stepping back and then just pass the consumed sequence to strtod. That way that 'e' gets consumed by %lf and then just ignored by strtod. But were exactly is all that in the language specification?

Pontone answered 13/10, 2014 at 6:51 Comment(9)
Perhaps the reason (though not a very good excuse) to the difference lies in the fact that sscanf is in stdio and strtod is in stdlib.Sizeable
Not really sure I understand: why does the result of sscanf seem strange to you? What exactly did you expect? Could you please give a bit more details?Polygraph
@HighPredator: OP probably means that variable c should attain the value 'e' and not the value 'x'. Or perhaps it should not attain any value at all, and function sscanf should return 1 instead of 2 (so it accurately emulates the behavior of strtod).Sizeable
@barakmanos, that may indeed be the case. Let's wait for OP's answer.Polygraph
@HighPredator: I actually described the two issues that I have with it in my question. I always intuitively expected sscanf format requirements and behavior to be in sync with strto... format requirements and behavior. The language standard actually states that, but apparently I saw more in it that there really was. For example, I expected sscanf to stop at exactly the same point where strto... would stop. Now I kinda "see" that the standard probably does not require that and allows sscanf to consume more.Pontone
But I still don't see where the standard allows successful conversion in cases when sscanf decided to "consume more" and the result of that consumption does not match the syntax requirements.Pontone
While the behaviour you observed seems a little odd, there's no requirement that sscanf and strtod should exhibit similar (or equivalent) behaviour. strto. *scanf() needs to scanf left to right. But strtod() may "look ahead" and decide where to put endptr.Lenard
@Blue Moon: Yes, but the language specification defines the behavior of f format specifier by simply referring to strtod. If there's a difference between f specifier and strtod, the standard should describe it somewhere. My questions is: where? Which specific wording?Pontone
An interesting case of duplicate -- not so much the question, but the answer: Difference between scanf() and strtol() / strtod() in parsing numbers Basically, ...scanf() is defined to take the longest possible sequence that is, or is a prefix of, a matching input, while strto...() takes the longest valid sequence. (The difference being a result of streams supporting only one character of guaranteed put-back, i.e. ...scanf() cannot step back as much as strto...() can.)Cracy
D
2

I just find the description below on die.net

The strtod(), strtof(), and strtold() functions convert the initial portion of the string pointed to by nptr to double, float, and long double representation, respectively.

The expected form of the (initial portion of the) string is optional leading white space as recognized by isspace(3), an optional plus ('+') or minus sign ('-') and then either (i) a decimal number, or (ii) a hexadecimal number, or (iii) an infinity, or (iv) a NAN (not-a-number).

A decimal number consists of a nonempty sequence of decimal digits possibly containing a radix character (decimal point, locale-dependent, usually '.'), optionally followed by a decimal exponent. A decimal exponent consists of an 'E' or 'e', followed by an optional plus or minus sign, followed by a nonempty sequence of decimal digits, and indicates multiplication by a power of 10.

A hexadecimal number consists of a "0x" or "0X" followed by a nonempty sequence of hexadecimal digits possibly containing a radix character, optionally followed by a binary exponent. A binary exponent consists of a 'P' or 'p', followed by an optional plus or minus sign, followed by a nonempty sequence of decimal digits, and indicates multiplication by a power of 2. At least one of radix character and binary exponent must be present.

An infinity is either "INF" or "INFINITY", disregarding case.

A NAN is "NAN" (disregarding case) optionally followed by '(', a sequence of characters, followed by ')'. The character string specifies in an implementation-dependent way the type of NAN.

Then I performed an experiment, I executed the code below with gcc

#include <stdlib.h>
#include <stdio.h>

char head[1024], *tail;

void core(const char *stmt){
    sprintf(head, "%s", stmt);
    double d=strtod(head, &tail);
    printf("cover %s to %.2f with length=%ld.\n", head, d, tail-head);
}

int main(){
    core("3.0x");
    core("3e");
    core("3ex");
    core("3e0x");

    return 0;
}

and get the result

cover 3.0x to 3.00 with length=3.
cover 3e to 3.00 with length=1.
cover 3ex to 3.00 with length=1.
cover 3e0x to 3.00 with length=3.

So, It seems that there should be some digits behind 'e'.

For sscanf , I performed another experiment with gcc code:

#include <stdlib.h>
#include <stdio.h>

char head[1024];

void core(const char *stmt){
    int i;sscanf(stmt, "%x%s", &i, head);
    printf("sscanf %s catch %d with '%s'.\n", stmt, i, head);
}

int main(){
    core("0");
    core("0x0g");
    core("0x1g");
    core("0xg");

    return 0;
}

then get the output below:

sscanf 0 catch 0 with ''.
sscanf 0x0g catch 0 with 'g'.
sscanf 0x1g catch 1 with 'g'.
sscanf 0xg catch 0 with 'g'.

It seems that sscanf would try to CATCH MORE CHARACTER AND WOULD NOT ROLLBACK IF IT JUDGED IT IS LEGAL CURRENTLY (MAY BE ILLEGAL WITH INCOMPLETE SITUATION).

Dimeter answered 15/10, 2014 at 9:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.