Does printf("%x",1) invoke undefined behavior?
Asked Answered
I

6

38

According to the C standard (6.5.2.2 paragraph 6)

If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions. If the number of arguments does not equal the number of parameters, the behavior is undefined. If the function is defined with a type that includes a prototype, and either the prototype ends with an ellipsis (, ...) or the types of the arguments after promotion are not compatible with the types of the parameters, the behavior is undefined. If the function is defined with a type that does not include a prototype, and the types of the arguments after promotion are not compatible with those of the parameters after promotion, the behavior is undefined, except for the following cases:

  • one promoted type is a signed integer type, the other promoted type is the corresponding unsigned integer type, and the value is representable in both types;
  • both types are pointers to qualified or unqualified versions of a character type or void.

Thus, in general, there is nothing wrong with passing an int to a variadic function that expects an unsigned int (or vice versa) as long as the value passed fits in both types. However, the specification for printf reads (7.19.6.1 paragraph 9):

If a conversion specification is invalid, the behavior is undefined. If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.

No exception is made for signed/unsigned mismatch.

Does this mean that printf("%x", 1) invokes undefined behavior?

Impetuosity answered 12/1, 2011 at 0:6 Comment(13)
People interested in this question might (or might not) be interested in this related question: #4587462Swathe
How can a function with arguments be "defined with a type that does not include a prototype"? Is that related to K&R-style stuff?Cesspool
And what about printf("%d",(char)1);. The description of printf doesn't say that it's the argument after integer promotions which must be the correct type, it says the argument itself must be. Should we conclude that it's an exception to that part of 6.5.2.2/6 as well?Eldest
Btw, I think your quote is insufficient to illustrate the problem, since it is undefined behavior to call printf if it hasn't been prototyped, and your quote concerns calls made where there is no prototype. The same argument promotions are applied to the arguments of varargs, though, according to 6.5.2.2/7, although that doesn't say anything about signed/unsigned compatibility. So maybe you're absolutely right, and signed/unsigned compatibility is only stated to apply to calls made with no prototype, not to varargs calls in general, let alone printf in particular.Eldest
@aschepler: it doesn't say "defined with no prototype". It says "the expression that denotes the called function" doesn't include a prototype. For example if you declare void foo();, then do foo(1), "the expression that denotes the called function" is foo, and its type does not include a prototype. The definition of foo will introduce a prototype, perhaps in a different translation unit, but foo doesn't have one at the call point.Eldest
If I am right, I think it's a defect in the standard and probably should be fixed. This strict interpretation would render huge volumes of code incorrect and require equally huge volumes of ugly and meaningless casts...Impetuosity
@Steve: Yes, that's how I read the first quoted sentence too. But it's the last sentence before the bullet point that is particularly confusing me.Cesspool
@Steve Jessop: I think that it's the only reasonable interpretation(!) to assume that the conversions mandated in the specification for a function call expression are applied before the types for the arguments to a function are determined.Dredi
When the function call expression is a call of a function with a ,... prototype, the only part of 6.5.2.2/6 that is relevant is the description of default argument promotions. The mismatched arguments exceptions are not applicable. (In any case, the function must be defined with a matching prototype and the ... parameters don't have a known type.) The corresponding requirements for accessing vargs are in 7.15.1.1 which describes the use of the va_arg macro. Here you are allowed to use va_arg to access (e.g.) an int as an unsigned int providing the value is in the correct range.Dredi
@Charles: OK, so varargs in general is alright, and R.'s objection amounts to saying that "printf" should perhaps specify that it reads its arguments using the varargs macros, or that the arguments must be such that they could be read using the varargs macros, rather than inaccurately restating the conditions under which it can go wrong.Eldest
The main practical thing that's unclear to me is if there are other consequences of the condition stated for printf, i.e. if it's intended to mean that the value passed must be a valid value for the specified type (prior to default promotions and possible signedness mismatch in the resulting type). Of course that goes more with the other question.Impetuosity
On the surface, unsigned short x = 1; printf("%hu\n", x); would also appear to be UB due to the unsigned / signed mismatch introduced by integer promotions, even though most people reading it would probably not expect it.Dancette
@dbush: I don't think so, because %hu expects an argument whose type after default promotions is the type of unsigned short with default promotions applied, which (assuming int wider than short) is not an unsigned type.Impetuosity
I
17

I believe it is technically undefined, because the "correct type" for %x is specified as unsigned int - and as you point out, there is no exception for signed/unsigned mismatch here.

The rules for printf are for a more specific case and thus override the rules for the general case (for another example of the specific overriding the general, it's allowable in general to pass NULL to a function expecting a const char * argument, but it's undefined behaviour to pass NULL to strlen()).

I say "technically", because I believe an implementation would need to be intentionally perverse to cause a problem for this case, given the other restrictions in the standard.

Insist answered 12/1, 2011 at 0:54 Comment(5)
I think this interpretation implies that the standard intends the printf family of functions to have their arguments passed in a different way than other variadic functions, which would make no sense.Unruffled
@Chris Lutz: This interpretation implies nothing about the intent of the standard, it merely puts forward a line of argument about the effect of the actual normative wording of the standard.Insist
"it's undefined behaviour to pass NULL to strlen())." But that isn't making a special case; it's undefined behaviour to dereference a null pointer, and strlen dereferences the pointer it is given. The act of passing null to strlen() isn't undefined, although it results in an UB-causing action later with certainty.Episcopal
@Karl: actually it is the act of passing NULL to strlen that's undefined. This is because standard library functions are defined formally by their behavior and not by a C implementation. See 7.4.1/1: "If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined."Impetuosity
@caf: In the last few years, implementations have become increasingly intentionally perverse. I don't think programming in C will be safe unless or until someone writes a standard which establishes helpful normative behaviors and requires perverse compilers to document departures from the norm.Acree
M
7

No, because %x formats an unsigned int, and the type of the constant expression 1 is int, while the value of it is expressible as an unsigned int. The operation is not UB.

Mauriac answered 12/1, 2011 at 0:16 Comment(10)
It formats both. :) The variadic argument spec overrides the printf spec, and the former allows for the use of int where unsigned int is expected.Mauriac
Actually, "%x" takes an "unsigned int", not an "int", argument. R. is wondering if the various details he quotes from the standard means that this is, technically speaking, undefined behavior.Swathe
@Jonathan - I agree with your reading of the standard. If that was in your answer, I would upvote you.Unruffled
6.5.2.2 defines the behavior in general for variadic functions, but 7.19.6.1 turns around and says that unless the type matches the format specifier, the behavior is undefined. It seems like this paragraph should be omitted or fixed to mention the exception for signed/unsigned mismatch if that's the intent.Impetuosity
@R.. - I'm assuming that by "If any argument is not the correct type" they mean "If any argument is not the correct type based on the previously outlined rules for type punning."Unruffled
Edited to clarify my justification and to correct the type %x strictly expects.Mauriac
Default argument promotion will not normally cause int arguments to be converted to unsigned int so that fact that 1 must be expressible as an unsigned int is irrelevant. If printf was guaranteed to use the va_arg macro then you would expect the exception in 7.12.1.1 to hold but this is not a requirement. The type of the argument after default argument promotion is still int, not unsigned int and (as others have said) 7.19.6.1 clearly states: "If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined."Dredi
True for most types, but for a signed integer type and its corresponding unsigned integer type where the value is representable by both, 6.5.2.2 allows it. (From a practical perspective, this is always true anyway, but from a standards perspective it appears to be explicitly defined.)Mauriac
@JonathanGrynspan 6.5.2.2 does not say anything of the sort. In fact, C11 6.5.2.2/7 explicitly says "The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter".Gladysglagolitic
FWIW my view is that this code should be legal but the standard does not define it, and I consider the standard defective here. A blatant inconsistency can be seen by looking at the l modifier specification, which clearly defines that "%lx" may correspond to arguments 1L and -1L !Gladysglagolitic
V
5

It is undefined behavior, for the same reason that re-interpreting a pointer to an integer type to complementary type of opposite signedness. This isn't allowed, unfortunately, in both directions because a valid representation in one may be a trap implementation in the other.

The only reason I see that from signed to unsigned re-interpretation there may be a trap representation is this perverted case of sign representation where the unsigned type just masks out the sign bit. Unfortunately such a thing is allowed as of 6.2.6.2 of the standard. On such an architecture all negative values of the signed type may be trap representations of the unsigned type.

In your example case this is even more weird, since having 1 a trap representation for the unsigned type is in turn not allowed. So to make it a "real" example, you'd have to ask your question with a -1.

I don't think that there is still any architecture for which people write C compilers that has these features, so definitively live would become more easy if a newer version of the standard could abolish this nasty case.

Valdivia answered 12/1, 2011 at 8:20 Comment(4)
I'm not convinced this is allowed by the standard. As far as I know, values representable in both signed and unsigned versions of the type are required to have the same representation. Note that the aliasing rules in "Representation of Types" explicitly allow access as a sign-mismatched type.Impetuosity
@R.. Just look it up in the standard. It explicitly states that the number of value bits of the signed type is less or equal to that number of the unsigned type. And in particular that a negative signed value may be a trap representation of the unsigned type is also allowed. And you are probably right for the aliasing rules. So this needs a defect report.Valdivia
I agree with what you just said. However, that doesn't contradict a requirement that positive values of the signed type must agree in representation with the same values for the unsigned type - a requirement which I believe is intended to be there and implied by other conditions even if not explicitly stated.Impetuosity
@R.. In fact it is explicitly stated that positive values as long as they fit in both types must have the same representation. I had already corrected my answer accordingly.Valdivia
E
2

TL;DR it is not UB.

As n. 'pronouns' m. pointed out in this answer, the C standard says that all non-negative values of a signed integer type have the exact same representation as the corresponding unsigned type, and therefore can be used interchangeable as long as the value is in the range of both types.

From the C99 standard 6.2.5 Types - Paragraph 9 and Footnote 31:

9 The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same. 31)

31) The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.

The exact same text is in the C11 Standard in 6.2.5 Types - Paragraph 9 and Footnote 41.

Escribe answered 17/9, 2020 at 11:6 Comment(0)
C
0

I believe it's undefined. Functions with a variable-length arguments list don't have an implicit conversion when accepting arguments, so 1 won't be cast to unsigned int when being past to printf(), causing undefined behavior.

Cynthiacynthie answered 15/3, 2016 at 3:29 Comment(1)
@Gladysglagolitic My bad. I meant "implicit conversion"Cynthiacynthie
A
0

The authors of the Standard do not generally try to explicitly mandate behavior in every imaginable corner case, especially when there is an obvious correct behavior which is shared by 100% of all implementations, and there no reason to expect any implementation to do anything else. Despite the Standard's explicit requirement that signed and unsigned types have matching memory representations for values that fit in both, it would be theoretically possible for an implementation to pass them to variadic functions differently. The Standard doesn't forbid such behavior, but I see no evidence of the authors intentionally permitting it. Most likely, they simply didn't consider such a possibility since no implementation had ever (and so far as I know, has ever) worked that way.

It would probably be reasonable for a sanitizing implementation to squawk if code uses %x on a signed value, though a quality sanitizing implementation should also provide an option to silently accept such code. There's no reason for sane implementations to do anything other than either process the passed value as unsigned or squawk if it's used in a diagnostic/sanitizing mode. While the Standard might forbid an implementation from regarding as unreachable any code that uses %x on a signed value, anyone who thinks implementations should avail themselves of such freedom should be recognized as a moron.

Programmers who are targeting exclusively sane non-diagnostic implementations shouldn't need to worry about adding casts when outputting things like "uint8_t" values, but those whose code might be fed to moronic implementations might want to add such casts to prevent compilers from the "optimizations" such implementations might impose.

Acree answered 16/6, 2017 at 20:41 Comment(3)
This answer reads like it was written without consideration of anything that's already been written/discussed on the topic. I'm not the downvoter but I'm not surprised someone did. For normal variadic functions written in C (as opposed to abstract ones just specified by the standard), the behavior is well-defined when passing a signed 1 to a function expecting an unsigned argument. The question is very specific to printf and family, which are not specified in terms of va_arg.Impetuosity
@R..: Perhaps I should adjust my answer to make it more printf-centric, but the main point is that the authors of the Standard would have had no reason to expect that implementations would do anything with positive signed values other than treat them the same as the corresponding unsigned values, and thus saw no reason to explicitly mandate such behavior for printf. If the authors of the C Standard wanted to void breaking existing code (which is their claim), they would have intended that Undefined Behavior be taken as an invitation for implementers to exercise reasonable judgement...Acree
...about how something should be processed, based upon a variety of factors. The only real question is whether all implementations should be relied upon to be the product of reasonable judgments.Acree

© 2022 - 2024 — McMap. All rights reserved.