What is exactly an "invalid conversion specification"?
Asked Answered
K

3

5

As per C11, chapter §7.21.6.1, P9

If a conversion specification is invalid, the behavior is undefined.282) If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.

Till time, my understanding was, for

  char str [] = "Sourav";
  • A statement like printf("%S", str); belong to the first sentence, there exist no CS as %S (UPPERCASE)
  • A statement like printf("%d", str); belongs to the second sentence (mismatch between CS and argument type, but the %d is not an "invalid" CS, anyway)

until advised otherwise by a recent comment thread.

Is my understanding wrong? Can the second statement also be categorized as "invalid" (PS- not "wrong") conversion specifier?


Update: The answer and the comment thread is deleted, here's a snap for <10K users.

Keratin answered 9/8, 2017 at 10:41 Comment(11)
What exactly is not clear here?Beforehand
@tilz0R as I said..."until advised otherwise". If you can see a dupe targeting to this discussion, please mark it. I am just confirming my understanding here, as spec words can be tricky at times.Keratin
the second case should throw up a warning or errorPelson
@ChrisTurner Well, it should, but here, I'm targeting the "categorization" more, please notice LL tag. :)Keratin
I'm no expert, but I think your understanding is correct.Disloyal
I think the operating word here is "corresponding". The first sentence says that if in the string exists a specifier which is not valid. If not, then each specifier is matched with its corresponding argument. Then the second statement says about type matching. So I think, your interpretation is correct.Phyllome
@AjayBrahmakshatriya Thank you (not for agreeing to my view, but for explaining in your own words, appreciate the effort!!)Keratin
I think the second example does not lie in the first category because "corresponding" is not used.Phyllome
@AjayBrahmakshatriya I copied your words in my rejoinder. :)Keratin
Interesting. A missing aspect of the question is that this post focuses on *printf() - without saying that. I could see a different answer for *scanf(). E.g. does the format need to be "correct/matching" if the specifier is not reached as part of the scan? Hmmmm. (I think it does)Isthmian
"A statement like printf("%S", str); belong to the first sentence, there exist no CS as %S (UPPERCASE)" deserve qualification. A compliant compiler may add "%S" as a valid conversion specification and is therefore not invalid, but valid - for that compiler.Isthmian
U
9

The "validity" of a conversion specification is determined by the standard paragraphs above the one you quoted:

7.21.6.1 - p4 to p8

Each conversion specification is introduced by the character %. After the %, the following appear in sequence: ...

The flag characters and their meanings are: ...

The conversion specifiers and their meanings are: ...

This here means that any conversion specification that is composed from the elements in the above lists is valid, all others are not in the eyes of the standard. That's why the paragraph in your code mentions two causes of UB. One is a specification that is not according to the grammar, and the other is specification and type mismatch.

The comment you linked to seems to use "invalid" colloquially. I.e. both uses of the conversion specifications are "invalid", since they lead to UB. But only the first is "invalid" from a language lawyer standpoint.

Uncalledfor answered 9/8, 2017 at 10:54 Comment(0)
K
1

To support my understanding (and probably to reasonify the understanding in first place), let me add my two cents.

For a minute, let's see the footnote 282, as mentioned in quote. It says,

See ‘‘future library directions’’ (7.31.11).

and in §7.31.11

Lowercase letters may be added to the conversion specifiers and length modifiers in fprintf and fscanf. Other characters may be used in extensions.

Which mentions nothing about the relation between a CS and its argument (if any). So, the "validity" of a CS is not dependent on the supplied argument.

Now, that said, couple of more pointers

  • Point 1 :: Please note the mention of the phrase "conversion specification", not conversion specifier, in the quote. As per chapter §7.21.6.1/P4,

    Each conversion specification is introduced by the character %. After the %, the following appear in sequence:

    • Zero or more flags [...]

    • An optional minimum field width [...]

    • An optional precision [...]

    • An optional length modifier [...]

    • A conversion specifier character [...]

    and we have definitive lists for all the elements mentioned in

    • P5, field width and precision
    • P6, flags
    • P7, length modifier
    • P8, conversion specifier

    Thereby, there is (or should be) no relation with the supplied argument to identify the "validity" of the conversion specification.

    To complement this understanding, borrowing words from the comment by Ajay Brahmakshatriya

    "I think the operating word here is "corresponding". The first sentence says that if in the string exists a specifier which is not valid. If not, then each specifier is matched with its corresponding argument. Then the second statement says about type matching.....I think the second example does not lie in the first category because "corresponding" is not used"

  • Point 2 :: On the other hand, spec is quite distinct and clear about a "mismatch" between the CS and the supplied corresponding argument type. So, that is a different case altogether.

Now, for example, in case, both the cases are combined, it's hard to tell which condition causes the UB, but it's UB, for more than one reason, for sure.

Example:

   printf("%D", str);

following the question.

Keratin answered 9/8, 2017 at 11:27 Comment(1)
OK, a downvote to the question and my answer at the same time. I guess it shows disagreement. may I please ask to chime in instead of staying silent so that I (and probably others, too) can gain some more knowledge / insight? Thank you . :)Keratin
S
1

The footnote 282 points to Future library directions C11 7.31.11p1:

Lowercase letters may be added to the conversion specifiers and length modifiers in fprintf and fscanf. Other characters may be used in extensions.

so it too hints that invalid conversion specifiers mean those conversion specifications that are not in the list, and of those, lowercase letters might be used by a future C version; and extensions are free to use other letters.


And while non-normative, the C11 Appendix J.2. contains the following:

  • An invalid conversion specification is found in the format for one of the formatted input/output functions, or the strftime or wcsftime function (7.21.6.1, 7.21.6.2, 7.27.3.5, 7.29.2.1, 7.29.2.2, 7.29.5.1).

i.e. an invalid conversion specification to *printf is here paired with invalid conversion specification to strftime - which does not take variable arguments, and the invalidity cannot arise from mismatch between the conversion specification and the corresponding argument;

This can be contrasted with

  • There are insufficient arguments for the format in a call to one of the formatted input/output functions, or an argument does not have an appropriate type (7.21.6.1, 7.21.6.2, 7.29.2.1, 7.29.2.2).

which discusses the mismatch between the arguments and the conversion specifiers, without mentioning the word invalid.

Spumescent answered 9/8, 2017 at 12:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.