Note: I completely reworked the question to more properly reflect what I am setting the bounty for. Please excuse any inconsistencies with already-given answers this might have created. I did not want to create a new question, as previous answers to this one might be helpful.
I am working on implementing a C standard library, and am confused about one specific corner of the standard.
The standard defines the number formats accepted by the scanf
function family (%d, %i, %u, %o, %x) in terms of the definitions for strtol
, strtoul
, and strtod
.
The standard also says that fscanf()
will only put back a maximum of one character into the input stream, and that therefore some sequences accepted by strtol
, strtoul
and strtod
are unacceptable to fscanf
(ISO/IEC 9899:1999, footnote 251).
I tried to find some values that would exhibit such differences. It turns out that the hexadecimal prefix "0x", followed by a character that is not a hexadecimal digit, is one such case where the two function families differ.
Funny enough, it became apparent that no two available C libraries seem to agree on the output. (See test program and example output at the end of this question.)
What I would like to hear is what would be considered standard-compliant behaviour in parsing "0xz"?. Ideally citing the relevant parts from the standard to make the point.
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
int main()
{
int i, count, rc;
unsigned u;
char * endptr = NULL;
char culprit[] = "0xz";
/* File I/O to assert fscanf == sscanf */
FILE * fh = fopen( "testfile", "w+" );
fprintf( fh, "%s", culprit );
rewind( fh );
/* fscanf base 16 */
u = -1; count = -1;
rc = fscanf( fh, "%x%n", &u, &count );
printf( "fscanf: Returned %d, result %2d, consumed %d\n", rc, u, count );
rewind( fh );
/* strtoul base 16 */
u = strtoul( culprit, &endptr, 16 );
printf( "strtoul: result %2d, consumed %d\n", u, endptr - culprit );
puts( "" );
/* fscanf base 0 */
i = -1; count = -1;
rc = fscanf( fh, "%i%n", &i, &count );
printf( "fscanf: Returned %d, result %2d, consumed %d\n", rc, i, count );
rewind( fh );
/* strtol base 0 */
i = strtol( culprit, &endptr, 0 );
printf( "strtoul: result %2d, consumed %d\n", i, endptr - culprit );
fclose( fh );
return 0;
}
/* newlib 1.14
fscanf: Returned 1, result 0, consumed 1
strtoul: result 0, consumed 0
fscanf: Returned 1, result 0, consumed 1
strtoul: result 0, consumed 0
*/
/* glibc-2.8
fscanf: Returned 1, result 0, consumed 2
strtoul: result 0, consumed 1
fscanf: Returned 1, result 0, consumed 2
strtoul: result 0, consumed 1
*/
/* Microsoft MSVC
fscanf: Returned 0, result -1, consumed -1
strtoul: result 0, consumed 0
fscanf: Returned 0, result 0, consumed -1
strtoul: result 0, consumed 0
*/
/* IBM AIX
fscanf: Returned 0, result -1, consumed -1
strtoul: result 0, consumed 1
fscanf: Returned 0, result 0, consumed -1
strtoul: result 0, consumed 1
*/
strto*
functions have defined behaviour when the subject string generates a value that is too large for the appropriate type. However, withscanf()
, the behaviour on receipt of a value that is too large is undefined. Thus, inputting12345678901234567890
tostrtol()
will yield an error indication (assumingsizeof(long) <= 8
), but anything could happen withscanf()
et al. – Ass