Purpose of Trigraph sequences in C++?
Asked Answered
I

9

141

According to C++'03 Standard 2.3/1:

Before any other processing takes place, each occurrence of one of the following sequences of three characters (“trigraph sequences”) is replaced by the single character indicated in Table 1.

----------------------------------------------------------------------------
| trigraph | replacement | trigraph | replacement | trigraph | replacement |
----------------------------------------------------------------------------
| ??=      | #           | ??(      | [           | ??<      | {           |
| ??/      | \           | ??)      | ]           | ??>      | }           |
| ??’      | ˆ           | ??!      | |           | ??-      | ˜           |
----------------------------------------------------------------------------

In real life that means that code printf( "What??!\n" ); will result in printing What| because ??! is a trigraph sequence that is replaced with the | character.

My question is what purpose of using trigraphs? Is there any practical advantage of using trigraphs?

UPD: In answers was mentioned that some European keyboards don't have all the punctuation characters, so non-US programmers have to use trigraphs in everyday life?

UPD2: Visual Studio 2010 has trigraph support turned off by default.

Inexplicable answered 5/8, 2009 at 17:15 Comment(9)
Some of the punctuation is harder to reach on european keyboards (to the point that some programmers use the US layout to type faster) Haven't seen one where the punctuations is entirely missing - maybe for slavic languages?Electrodynamometer
It may happen that some terminals and/or virtualization doesn't let you access easily to some characters. In my experience the main offender is the tilde.Ress
typing this on my DE-deadkeys keyboard, # is a key next to return, \ is "AltGr"+"ß" (next to 0), ^ is "^"+"^" (because of deadkeys; next to 1), [ is "AltGr"+"8", ] is "AltGr"+"9", | is "AltGr"+"<", { is "AltGr"+"7", } is "AltGr"+"0", and ~ is "~"+"~" (because of deadkeys, just above #). so no really big deal. my fingers are like typing these combinations on their own :-DBarrack
I thought, that it is normal to have two keyboard layouts and switch them according to the work I'm doing on the computer. It's the common way in central Europe region. It's pretty creepy to use these trigraphs. I'd vote for removing this from the standard.Lunneta
Because "the source character set of C source programs is contained within the 7-bit ASCII character set but is a superset of the ISO 646-1983 Invariant Code Set........"Centroid
@Lunneta You have your wish!Kimmy
FYI, trigraphs will be removed in C++17, despite the protestations of IBM. Adapting the code that uses trigraphs will be simple though, because trigraphs are really easy to parse.Luigi
We're discussing C++'03 Standard here. Publishing new standard doesn't mean that all systems in the world will support it instantly. And the question was about the initial purpose of this feature.Inexplicable
In my opinion, I'd say trigraphs still exist mainly to write obfuscated code :-)Impractical
B
104

This question (about the closely related digraphs) has the answer.

It boils down to the fact that the ISO 646 character set doesn't have all the characters of the C syntax, so there are some systems with keyboards and displays that can't deal with the characters (though I imagine that these are quite rare nowadays).

In general, you don't need to use them, but you need to know about them for exactly the problem you ran into. Trigraphs are the reason the the '?' character has an escape sequence:

'\?'

So a couple ways you can avoid your example problem are:

 printf( "What?\?!\n" ); 

 printf( "What?" "?!\n" ); 

But you have to remember when you're typing the two '?' characters that you might be starting a trigraph (and it's certainly never something I'm thinking about).

In practice, trigraphs and digraphs are something I don't worry about at all on a day-to-day basis. But you should be aware of them because once every couple years you'll run into a bug related to them (and you'll spend the rest of the day cursing their existance). It would be nice if compilers could be configured to warn (or error) when it comes across a trigraph or digraph, so I could know I've got something I should knowingly deal with.

And just for completeness, digraphs are much less dangerous since they get processed as tokens, so a digraph inside a string literal won't get interpreted as a digraph.

For a nice education on various fun with punctuation in C/C++ programs (including a trigraph bug that would defintinely have me pulling my hair out), take a look at Herb Sutter's GOTW #86 article.


Addendum:

It looks like GCC will not process (and will warn about) trigraphs by default. Some other compilers have options to turn off trigraph support (IBM's for example). Microsoft started supporting a warning (C4837) in VS2008 that must be explicitly enabled (using -Wall or something).

Brack answered 5/8, 2009 at 17:23 Comment(10)
Compatibility with C is the only reason? Is that possible to meet them in modern C++ programs?Inexplicable
Yes, C++ support trigraphs and digraphs as well.Brack
As I recall, at least one compiler I've used (g++ ?) requires an explicit command line option before trigraph and or digraph is translated, otherwise a warning is given but no substitution.Frangipani
Visual C++ gives no warnings. Trigraphs are standard, there is no reason for warnings.Inexplicable
@Michael, I mean, is there reason for someone to use trigraphs in their code nowadays? Or I could meet trigraphs only in old programs?Inexplicable
Compilers often give warnings for things that are standards conforming but often cause unintended results. For example, it's OK by the standard to declare an unused variable or have unreachable code, but my compilers often warn me about those things.Brack
@Jla3ep - I personally have never had a need for trigraphs, but unfortunately compilers will process code with them, so you need to be aware of them (to avoid accidental use). Also, if you get code from somewhere else you may run into their intentional use, but that would be extremely unusual. I think I've run into intentionally used trigraphs once in 20+ years (it was some code for an IBM mainframe).Brack
@MichaelBurr, Well you shouldn't have a problem if you run with the "no trigraph" flag....Centroid
It really only gets on my nerves when trigraphs are expanded in comments to do surprising things.Flexuosity
Why can my C compiler use trigraphs without using the -trigraph option for the compiler ?Teem
B
32

Kids today! :-)

Yes, foreign equipment, such as an IBM 3270 terminal. The 3270 has, if I remember, no curly braces! If you wanted to write C on an IBM mini / mainframe, you had to use the wretched trigraphs for every block boundary. Fortunately, I only had to write software in C to emulate some IBM minicomputer facilities, not actually write C software on the System/36.

Look next to the "P" key:

keyboard

Hmmm. Hard to tell. There is an extra button next to "carriage return", and I might have it backwards: maybe it was the "[" / "]" pair that was missing. At any rate, this keyboard would cause you grief if you had to write C.

Also, these terminals display EBCDIC, IBM's "native" mainframe character set, not ASCII (thanks, Pavel Minaev, for the reminder).

On the other hand, like the GNU C guide says: "You don't need this brain damage." The gcc compiler leaves this "feature" disabled by default.

Butler answered 5/8, 2009 at 17:37 Comment(3)
There's a reset button on the keyboard. That's awesome! Strange that caught my attention first though.Apocrine
Whoever wants to use C++17 on an EBCDIC machine, should be jailed for necrophilia.Amphibiotic
Unless a platform has no characters at all other than those in ISO646, could not everything that can be done with trigraphs, be done by requiring that every implementation define either a backslash or else any character that isn't in the C character set as a "meta" character, replace all references to backslash in the Standard with "meta", and adding backslash/meta escapes for any members of the C character set that aren't in ISO-646?Koblick
R
22

From The C++ Programming Language Special Edition, page 829

The ASCII special characters [, ], {, }, |, and \ occupy character set positions designated as alphabetic by ISO. In most European national ISO-646 character sets, these positions are occupied by letters not found in the English alphabet.

A set of trigraphs is provided to allow national characters to be expressed in a portable way using a truly standard minimal character set. This can be useful for interchange of programs, but it doesn't make it easier for people to read programs. Naturally, the long-term solution to this problem is for C++ programmers to get equipment that supports both their native language and C++ well. Unfortunately, this appears to be infeasible for some, and the introduction of new equipment can be a frustratingly slow process.

Recognizee answered 5/8, 2009 at 17:19 Comment(4)
"The introduction of new equipment can be a frustratingly slow process". Especially compared to the quick and painless process of standardizing programming language features.Illustrator
If this is a kludge for keyboard layouts, then it's funny that there is no trigraph e.g. for typing `, which is missing from the Italian and several other keyboard layoutsVocation
@Vocation Standard C (at least) does not use ` (as well as $, @, and ~) and so does not require that it be supported at all. From what I understand it’s less of a kludge for keyboard layouts than it is for character sets: ISO 646 gleefully removed characters from ASCII for “customization”, but then K&R went and used some of these, so that when the time came for ISO to standardize C, they had themselves a sticky situation. Thus the standard’s reluctance to include any punctuation except that strictly required for syntax (which is in any case all of it except the four characters above).Giacometti
... In any case ` as it is defined (as opposed to how it has been used historically) is a bit of an oddity: it’s meant to represent (not a quotation mark but) a free-standing acute accent on a typewriter, so that you could write `<BS>a (<BS> being an actual backspace, not rubout!) and get à. Similarly for _, intended for underlines. (Fun fact: even today, inside the troff | less pipe that man uses, bold a and underlined a are represented as a<BS>a and _<BS>a respectively.)Giacometti
R
15

They are for use on systems that lack some of the characters in C++'s basic character set. Needless to say, such systems are exceedingly rare.

Ravenravening answered 5/8, 2009 at 17:18 Comment(4)
Is that mean that I'll never use them in real life?Inexplicable
What country do you live in? Not all keyboards for all languages have the necessary keys.Alvord
Yes, but you might need to be aware of there existence in case one causes an unexpected result when encoutered in, say, a string literal.Ravenravening
@David Thornley: Most modern systems support all of the basic characters of C++ even if they are not in the conventional place or require a modifier sequence to type. Trigraphs only needed to be maintained in the source code on systems where the character cannot actually be represented in the system character set. I still maintain that such systems are exceedingly rare.Ravenravening
I
9

Trigraphs have been proposed for removal in C++0x. That said, there still seems to be strong argument in support of them - see C++ committee paper N2910 which discusses this. Apparently, EBCDIC is one major stronghold where they are needed.

Isochroous answered 5/8, 2009 at 17:39 Comment(2)
Yes, that "foreign language"! :-)Butler
They don't really say much except "results from an internal survey of customer feedback", but ah well. I am surprised that EBCDIC is still in widespread use though (and that these systems expect to use C++0x compilers)Electrodynamometer
E
5

I've seen trigraphs used in the early '90s to help convert PL/1 programs from a mainframe to be run/compiled/debugged on a PC.

They were dabbling with editing PL/I on the PC using a PL/I to C compiler and they wanted the code to work when moved back to the mainframe which did not support curly braces. I suggested that they could use macros like

#def BEGIN {    
#def END }  

or as a friendlier PL/I alternative

#def BEGIN ??<
#def END ??>

and if they really wanted to get fancy they could try

#ifdef MAINFRAME
    #def BEGIN ??<
    #def END ??>
#else
    #def BEGIN {    
    #def END }  
#endif

and then the program would look like it was written in Pascal. They just looked at me funny and wouldn't speak to me for the rest of the day. I don't think I blame them. :)

What killed the effort what not the tri-graphs, it was the IO system differences between the platforms. Opening files on the PC was so much different than the mainframe it would have introduced way too many kludges to keep the same code running on both.

Eventuate answered 5/8, 2009 at 17:32 Comment(1)
PL/1 = IBM's version of C (more or less). See my comment: IBM terminals have no '{' / '}' keys :-( Kind of hard to write C [++] on one of these, otherwise.Butler
C
3

Primarily because the C standard introduced them back in 1989, when there were issues with the presence of the characters that trigraphs map to on some machines. By the time the C++ standard was published in 1998, the need for trigraphs was not great. They are a wart on C; they are just as much a wart on C++. There was a need for them - especially outside the English-speaking world - which is why they were added to C.

Currier answered 5/8, 2009 at 17:24 Comment(1)
I've always suspected that IBM didn't speak English :-)Butler
K
2

Some European keyboards don't (didn't?) have all the punctuation characters that US keyboards had, because they needed the keys for their unusual alphabetic characters. So for example (making this up), the Swedish keyboard would have A-ring where the curly brace was.

To accommodate those users, trigraphs are a way to enter punctuation using only the most common ASCII characters.

Kinesthesia answered 5/8, 2009 at 17:20 Comment(1)
Trigraphs aren't really about data entry (they make code pretty unreadable), they are more about systems that don't actually have the required characters. If a system can record and display the character - even if a trigraph like key sequence needs to be typed - it would be much easier not to retain the trigraph sequence in the source.Ravenravening
S
2

They are there mostly for historical reasons. Nowadays, most modern keyboards for most languages allow access to all those characters, but this used to be a problem once with some European keyboards. This is why trigraphs were invented.

If you don't know what they're for, you shouldn't use them.

It's still good to be aware of them, though, since you might accidentally and unintentionally use one in your code.

Scabbard answered 5/8, 2009 at 17:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.