Is the character set of a char literal guaranteed to be ASCII?

Asked 30/10, 2012 at 15:0 Answered 30/10, 2012 at 15:4

Coming from a discussion started here, does the standard specify values for characters? So, is '0' guaranteed to be 48? That's what ASCII would tell us, but is it guaranteed? If not, have you seen any compiler where '0' isn't 48?

Affable answered 30/10, 2012 at 15:0 Comment(5)

One word: EBCDIC. – Pudens 30/10, 2012 at 15:1

I'm curious about why you're asking this. Obviously you can use '0' for an int value too. – Brambling 30/10, 2012 at 15:7

@NikosChantziaras I'm asking about because of the discussion in the linked answer, and why would I use '0' for an int when I can use 48? – Affable 30/10, 2012 at 15:8

One link: Extended Binary Coded Decimal Interchange Code. – Vadnee 30/10, 2012 at 15:11

Are the character digits ['0'..'9'] required to have contiguous numeric values? – Diversify 26/8, 2016 at 3:32

No. There's no requirement for the either the source or execution character sets to use an encoding with an ASCII subset. I haven't seen any non-ASCII implementations but I know someone who knows someone who has. (It is required that '0' - '9' have contiguous integer values, but that's a duplicate question somewhere else on SO.)

The encoding used for the source character set controls how the bytes of your source code are interpreted into the characters used in the C++ language. The standard describes the members of the execution character set as having values. It is the encoding that maps these characters to their corresponding values the determines the integer value of '0'.

Although at least all of the members of the basic source character set plus some control characters and a null character with value zero must be present (with appropriate values) in the execution character set, there is no requirement for the encoding to be ASCII or to use ASCII values for any particular subset of characters (other than the null character).

Declamatory answered 30/10, 2012 at 15:1 Comment(3)

I had to process a datafile containing an alternate character set once (I don't think it was even EBCDIC). But I did so using an ASCII compiler. – Arielariela 30/10, 2012 at 15:4

The paragraph in question is 2.2/3 "the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous." – Heroworship 30/10, 2012 at 15:9

The source and execution character sets are sets of characters, and there are specific requirements on what, at a minimum, has to be in those character sets. It's the encoding of those characters that is not specified (other than the constraint on '0' through '9'). That's an important distinction that, if overlooked, muddles discussions about characters. Not that it's a problem here ... – Ashram 30/10, 2012 at 15:33

No, the Standard is very careful not to specify what the source character encoding is.

C and C++ compilers run on EBCDIC computers too, you know, where '0' != 0x30.

However, I believe it is required that '1' == '0' + 1.

Arielariela answered 30/10, 2012 at 15:1 Comment(2)

Yes '1' == '0' + 1 is required (§2.3/3). – Pudens 30/10, 2012 at 15:5

This also implies that '5' - '0' == 5, which is good method to convert from character digits to numbers. – Babita 30/10, 2012 at 15:59

It's 0xF0 in EBCDIC. I've never used an EBCDIC compiler, but I'm told that they were all the rage at IBM for a while.

There's no requirement in the C++ standard that the source or execution encodings are ASCII-based. It is guaranteed that '0' == '1' - 1 (and in general that the digits are contiguous and in order). It is not guaranteed that the letters are contiguous, and indeed in EBCDIC 'J' != 'I' + 1 and 'S' != 'R' + 1.

Pulchia answered 30/10, 2012 at 15:2 Comment(3)

You can easily make GCC compile an EBCDIC- (or anything-) encoded source file by passing a suitable compiler option. – Knuth 30/10, 2012 at 15:25

@KerrekSB: There's -fexec-charset which affects the encoding of string constants in the binary, as well as character handling function (such as isdigit), but what option changes the source encoding? – Arielariela 30/10, 2012 at 20:16

There's also -finput-charset. – Knuth 30/10, 2012 at 22:37

According to the C++11 stardard N3225

The glyphs for the members of the basic source character set are intended to identify characters from the subset of ISO/IEC 10646 which corresponds to the ASCII character set. However, because the mapping from source ﬁle characters to the source character set (described in translation phase 1) is speciﬁed as implementation-deﬁned, an implementation is required to document how the basic source characters are represented in source ﬁles

In short, the character set is not required to be mapped to the ASCII table, even though I've never heard about any different implementation

Heartily answered 30/10, 2012 at 15:4 Comment(0)

Recommended topics

Hot tags