Is the character set of a char literal guaranteed to be ASCII?
Asked Answered
A

4

8

Coming from a discussion started here, does the standard specify values for characters? So, is '0' guaranteed to be 48? That's what ASCII would tell us, but is it guaranteed? If not, have you seen any compiler where '0' isn't 48?

Affable answered 30/10, 2012 at 15:0 Comment(5)
One word: EBCDIC.Pudens
I'm curious about why you're asking this. Obviously you can use '0' for an int value too.Brambling
@NikosChantziaras I'm asking about because of the discussion in the linked answer, and why would I use '0' for an int when I can use 48?Affable
One link: Extended Binary Coded Decimal Interchange Code.Vadnee
Are the character digits ['0'..'9'] required to have contiguous numeric values?Diversify
D
16

No. There's no requirement for the either the source or execution character sets to use an encoding with an ASCII subset. I haven't seen any non-ASCII implementations but I know someone who knows someone who has. (It is required that '0' - '9' have contiguous integer values, but that's a duplicate question somewhere else on SO.)

The encoding used for the source character set controls how the bytes of your source code are interpreted into the characters used in the C++ language. The standard describes the members of the execution character set as having values. It is the encoding that maps these characters to their corresponding values the determines the integer value of '0'.

Although at least all of the members of the basic source character set plus some control characters and a null character with value zero must be present (with appropriate values) in the execution character set, there is no requirement for the encoding to be ASCII or to use ASCII values for any particular subset of characters (other than the null character).

Declamatory answered 30/10, 2012 at 15:1 Comment(3)
I had to process a datafile containing an alternate character set once (I don't think it was even EBCDIC). But I did so using an ASCII compiler.Arielariela
The paragraph in question is 2.2/3 "the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous."Heroworship
The source and execution character sets are sets of characters, and there are specific requirements on what, at a minimum, has to be in those character sets. It's the encoding of those characters that is not specified (other than the constraint on '0' through '9'). That's an important distinction that, if overlooked, muddles discussions about characters. Not that it's a problem here ...Ashram
A
11

No, the Standard is very careful not to specify what the source character encoding is.

C and C++ compilers run on EBCDIC computers too, you know, where '0' != 0x30.

However, I believe it is required that '1' == '0' + 1.

Arielariela answered 30/10, 2012 at 15:1 Comment(2)
Yes '1' == '0' + 1 is required (§2.3/3).Pudens
This also implies that '5' - '0' == 5, which is good method to convert from character digits to numbers.Babita
P
3

It's 0xF0 in EBCDIC. I've never used an EBCDIC compiler, but I'm told that they were all the rage at IBM for a while.

There's no requirement in the C++ standard that the source or execution encodings are ASCII-based. It is guaranteed that '0' == '1' - 1 (and in general that the digits are contiguous and in order). It is not guaranteed that the letters are contiguous, and indeed in EBCDIC 'J' != 'I' + 1 and 'S' != 'R' + 1.

Pulchia answered 30/10, 2012 at 15:2 Comment(3)
You can easily make GCC compile an EBCDIC- (or anything-) encoded source file by passing a suitable compiler option.Knuth
@KerrekSB: There's -fexec-charset which affects the encoding of string constants in the binary, as well as character handling function (such as isdigit), but what option changes the source encoding?Arielariela
There's also -finput-charset.Knuth
H
2

According to the C++11 stardard N3225

The glyphs for the members of the basic source character set are intended to identify characters from the subset of ISO/IEC 10646 which corresponds to the ASCII character set. However, because the mapping from source file characters to the source character set (described in translation phase 1) is specified as implementation-defined, an implementation is required to document how the basic source characters are represented in source files

In short, the character set is not required to be mapped to the ASCII table, even though I've never heard about any different implementation

Heartily answered 30/10, 2012 at 15:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.