Coming from a discussion started here, does the standard specify values for characters? So, is '0'
guaranteed to be 48? That's what ASCII would tell us, but is it guaranteed? If not, have you seen any compiler where '0'
isn't 48?
No. There's no requirement for the either the source or execution character sets to use an encoding with an ASCII subset. I haven't seen any non-ASCII implementations but I know someone who knows someone who has. (It is required that '0' - '9' have contiguous integer values, but that's a duplicate question somewhere else on SO.)
The encoding used for the source character set controls how the bytes of your source code are interpreted into the characters used in the C++ language. The standard describes the members of the execution character set as having values. It is the encoding that maps these characters to their corresponding values the determines the integer value of '0'
.
Although at least all of the members of the basic source character set plus some control characters and a null character with value zero must be present (with appropriate values) in the execution character set, there is no requirement for the encoding to be ASCII or to use ASCII values for any particular subset of characters (other than the null character).
No, the Standard is very careful not to specify what the source character encoding is.
C and C++ compilers run on EBCDIC computers too, you know, where '0' != 0x30
.
However, I believe it is required that '1' == '0' + 1
.
'1' == '0' + 1
is required (§2.3/3). –
Pudens '5' - '0' == 5
, which is good method to convert from character digits to numbers. –
Babita It's 0xF0
in EBCDIC. I've never used an EBCDIC compiler, but I'm told that they were all the rage at IBM for a while.
There's no requirement in the C++ standard that the source or execution encodings are ASCII-based. It is guaranteed that '0' == '1' - 1
(and in general that the digits are contiguous and in order). It is not guaranteed that the letters are contiguous, and indeed in EBCDIC 'J' != 'I' + 1
and 'S' != 'R' + 1
.
-fexec-charset
which affects the encoding of string constants in the binary, as well as character handling function (such as isdigit
), but what option changes the source encoding? –
Arielariela -finput-charset
. –
Knuth According to the C++11 stardard N3225
The glyphs for the members of the basic source character set are intended to identify characters from the subset of ISO/IEC 10646 which corresponds to the ASCII character set. However, because the mapping from source file characters to the source character set (described in translation phase 1) is specified as implementation-defined, an implementation is required to document how the basic source characters are represented in source files
In short, the character set is not required to be mapped to the ASCII table, even though I've never heard about any different implementation
© 2022 - 2024 — McMap. All rights reserved.
'0'
for anint
value too. – Brambling'0'
for anint
when I can use48
? – Affable