Why are there digraphs in C and C++?

Asked 11/1, 2009 at 6:7 Answered 24/3, 2023 at 16:57

Solved c++c c99 language-design digraphs

I learned today that there are digraphs in C99 and C++. The following is a valid program:

%:include <stdio.h>

%:ifndef BUFSIZE
 %:define BUFSIZE  512
%:endif

void copy(char d<::>, const char s<::>, int len)
<%
    while (len-- >= 0)
    <%
        d<:len:> = s<:len:>;
    %>
%>

My question is: why do they exist?

Arathorn answered 11/1, 2009 at 6:7 Comment(4)

Verify my translation? %: is #, and <% %> is {}, and <: :> is []. Is this correct? – Innocent 11/1, 2009 at 7:2

The real answer: because IBM was loud and insisted on forcing it on everyone. – Violaceous 31/12, 2014 at 5:57

Voting to reopen. That question is more specific than this (only about and and or). This one is posed on a more useful form and has more upvotes. Edit: should be a duplicate of: #1235082 instead. – Mooring 9/5, 2016 at 15:6

The real answer: So you can write obfuscated code :-) – Tails 29/10, 2016 at 4:38

Digraphs were created for programmers that didn't have a keyboard which supported the ISO 646 character set.

http://en.wikipedia.org/wiki/C_trigraph

Limann answered 11/1, 2009 at 6:14 Comment(4)

Non-ASCII keyboards were not a problem. Sure it looked odd, but... main(int argc,char *argvÄÅ) ä printf("HelloÖn"); å – Terrell 6/5, 2019 at 10:44

@Terrell That it "looked odd" is exactly why the Scandinavians asked for digraphs and there was resistance on the C standards committee (X3J11, of which I was a member) to add them. – Aerugo 21/5, 2023 at 17:49

@Pryftan The question is about digraphs, not trigraphs. As for "if wikipedia says this", you should read the Wikipedia article to see what it says rather than speculating. (The Wikipedia article is not wrong but this answer is.) – Aerugo 21/5, 2023 at 17:51

As a Scandinavian I just changed to ASCII glyphs and wrote funny looking (to me) emails instead. – Terrell 7/9, 2023 at 14:8

I believe that their existence can be traced back to the possibility that somewhere, somebody is using a compiler with an operating system whose character set is so archaic that it doesn't necessarily have all the characters that C or C++ need to express the whole language.

Also, it makes for good entries in the IOCCC.

Sapper answered 11/1, 2009 at 6:10 Comment(5)

Not necessarily the compiler, Greg. Some of the mainframe EBCDIC character sets don't have consistent characters for the square brackets, which rather stuffs up array processing. This is a limitation of the editor and/or terminal emulator more than the compiler itself. – Spender 11/1, 2009 at 6:27

I didn't really mean it was only the compiler. I edited to clarify. – Sapper 11/1, 2009 at 6:51

No, it has nothing to do with EBCDIC. These sequences were for the sake of Scandinavians who used some of the ASCII characters as language characters (so the symbols were different on the keycaps and in output). – Aerugo 28/2, 2014 at 8:18

The mention of IOCCC added significant value to this answer for me. – Reverent 4/7, 2014 at 14:17

@Pryftan The question is about digraphs, not trigraphs. I was on X3J11 and was directly involved in the discussions. My statement is correct. – Aerugo 21/5, 2023 at 17:40

I think it's because some of the keyboards on this planet might not have keys like '#' and '{'.

Orton answered 11/1, 2009 at 6:11 Comment(0)

The digraphs and trigraphs in C/C++ come from the days of six bit character sets used by the CDC6000 (60 bits), Univac 1108 (36 bits), DECsystem 10 and 20 systems (36 bits) each of which used a proprietary 64 character set not compatible with the ASA X3.4-1963 (Now know as ANSI X3.4-1963 "7-bit American National Standard Code for Information Interchange"). The latest revision is ANSI X3.4-1986.

Since these systems were incapable of representing all of the 96 graphical code points, many were omitted. In addition, X3.4 was coordinated with other National Standard Institutes (GBR, GER, ITA, etc) and there were code points in X3.4 which were designated as national replacement characters - the most obvious example is the # for the Britsh Pound symbol (obvious because the name of the # character is "pound sign" from it's conventional usage in US commerce - prior to the the evolution of Twitter) and the '{' '}' were also designated as national replacement characters.

Thus digraphs were introduced to provide a mechanism for those computer systems incapable of representing the characters, and also for data terminal equipment which assigned national replacement characters to the conflicting code points. Di/Tri-graphs have become a archaic artifact of computing history (a subject not taught in computer science these days).

An exhaustive paper on this subject can be found here: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.96.678&rep=rep1&type=pdf

Masculine answered 11/11, 2016 at 14:28 Comment(1)

The DEC-10 and 20 generally used standard ASCII for all text files, including source code. The 6-bit character set was typically only used in metadata like file and directory names. – Hornbeck 8/12, 2023 at 11:27

They were created as a simpler alternative to trigraphs according to the article on Wikipedia.

I.e., for 5 trigraphs ??(, ??), ??<, ??>, ??=, the replacing digraphs were supplied: <:, :>, <%, %>, %:. This happened in 1994.

Otisotitis answered 24/3, 2023 at 16:57 Comment(0)

Recommended topics

Hot tags