Why does GCC emit a warning when using trigraphs, but not when using digraphs?
Asked Answered
V

4

7

Code:

#include <stdio.h>

int main(void)
{
  ??< puts("Hello Folks!"); ??>
}

The above program, when compiled with GCC 4.8.1 with -Wall and -std=c11, gives the following warning:

source_file.c: In function ‘main’:
source_file.c:8:5: warning: trigraph ??< converted to { [-Wtrigraphs]
     ??< puts("Hello Folks!"); ??>
 ^
source_file.c:8:30: warning: trigraph ??> converted to } [-Wtrigraphs]

But when I change the body of main to:

<% puts("Hello Folks!"); %>

no warnings are thrown.

So, Why does the compiler warn me when using trigraphs, but not when using digraphs?

Volteface answered 11/5, 2015 at 11:58 Comment(5)
possible duplicate of Why are trigraphs generating errors in modern C++ compilers?Merril
@ShafikYaghmour I think the answers there contain all the information that could be given in an answer to this question, even if a new version (or different frontend?) of gcc downgraded its handling of trigraphs to a warning.Merril
@ShafikYaghmour The linked question still says gcc generates warnings and the errors are from Turbo C. So I don't think anything has changed since.Dorren
@BlueMoon I just realized the reason why the behavior appears different is that now almost everyone is using -std=xxx which means that gcc will automatically turn on trigraphs. So perhaps I agree this is a duplicate.Subsellium
the trigraph and digraph were from the days when many/most keyboards did not have the appropriate keys. Today, all that is obsolete and should not be used.Disentail
S
5

This gcc document on pre-processing gives a pretty good rationale for a warning (emphasis mine):

Trigraphs are not popular and many compilers implement them incorrectly. Portable code should not rely on trigraphs being either converted or ignored. With -Wtrigraphs GCC will warn you when a trigraph may change the meaning of your program if it were converted.

and in this gcc document on Tokenization explains digraphs unlike trigraphs do not potential negative side effects (emphasis mine):

There are also six digraphs, which the C++ standard calls alternative tokens, which are merely alternate ways to spell other punctuators. This is a second attempt to work around missing punctuation in obsolete systems. It has no negative side effects, unlike trigraphs,

Subsellium answered 11/5, 2015 at 12:11 Comment(3)
This doesn't answer why digraphs don't throw a warning (or it would imply that they are more popular)Impose
@Impose that is the implication but I added another document to make that explicitSubsellium
All three answers tell the same thing but I like yours as it includes short, correct quotes. The tick goes to you! :)Volteface
K
6

Because trigraphs have the undesirable effect of silently changing code. This means that the same source file is valid both with and without trigraph replacement, but leads to different code. This is especially problematic in string literals, like "<em>What??</em>".

Language design and language evolution should strive to avoid silent changes. Having the compiler warn about trigraphs is a good thing to have.

Contrast this with digraphs, which were new tokens that do not lead to silent changes.

Keijo answered 11/5, 2015 at 12:13 Comment(0)
S
5

This gcc document on pre-processing gives a pretty good rationale for a warning (emphasis mine):

Trigraphs are not popular and many compilers implement them incorrectly. Portable code should not rely on trigraphs being either converted or ignored. With -Wtrigraphs GCC will warn you when a trigraph may change the meaning of your program if it were converted.

and in this gcc document on Tokenization explains digraphs unlike trigraphs do not potential negative side effects (emphasis mine):

There are also six digraphs, which the C++ standard calls alternative tokens, which are merely alternate ways to spell other punctuators. This is a second attempt to work around missing punctuation in obsolete systems. It has no negative side effects, unlike trigraphs,

Subsellium answered 11/5, 2015 at 12:11 Comment(3)
This doesn't answer why digraphs don't throw a warning (or it would imply that they are more popular)Impose
@Impose that is the implication but I added another document to make that explicitSubsellium
All three answers tell the same thing but I like yours as it includes short, correct quotes. The tick goes to you! :)Volteface
T
4

May be because it has no negative side effects, unlike trigraphs as is stated in gcc documentation:

Punctuators are all the usual bits of punctuation which are meaningful to C and C++. All but three of the punctuation characters in ASCII are C punctuators. The exceptions are ‘@’, ‘$’, and ‘`’. In addition, all the two- and three-character operators are punctuators. There are also six digraphs, which the C++ standard calls alternative tokens, which are merely alternate ways to spell other punctuators. This is a second attempt to work around missing punctuation in obsolete systems. It has no negative side effects, unlike trigraphs, but does not cover as much ground. The digraphs and their corresponding normal punctuators are:

 Digraph:        <%  %>  <:  :>  %:  %:%:
 Punctuator:      {   }   [   ]   #    ##
Top answered 11/5, 2015 at 12:12 Comment(0)
S
3

Trigraphs are nasty because they use character sequences which could legally appear within valid code. A common case which used to cause compiler errors on code for classic Macintosh:

unsigned int signature = '????';  /* Should be value 0x3F3F3F3F */

Trigraph processing would would turn that into:

unsigned int signature = '??^;  /* Should be value 0x3F3F3F3F */

which would of course not compile. In some slightly rarer cases, it would be possible for such processing to yield code which would compile, but with different meaning from what was intended, e.g.

char *template = "????/1234";

which would get turned into

char *template = "??S4"; // ??/ becomes \, and \123 becomes S

Not the string literal that was intended, but still perfectly legitimate nonetheless.

By contrast, digraphs are relatively benign because outside of some possible weird corner cases involving macros, no code containing processable digraphs would have a legitimate meaning in the absence of such processing.

Stroller answered 1/7, 2015 at 20:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.