Are the character digits ['0'..'9'] required to have contiguous numeric values?
Asked Answered
P

1

24

Must a C++ implementation set the chars '0'-'9' to have contiguous numeric values, i.e. so that:

'0' -> 0+n
'1' -> 1+n
 m  -> m+n
'9' -> 9+n

I cannot find it mentioned in the documentation of isdigit ([classification] (22.3.3.1 Character classification)) *, nor can I find it in the locale documentation (but maybe I did not look hard enough).

In 2.3 Character sets, we find that

The basic source character set consists of 96 characters: the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters

But it doesn't mention any ordering (but maybe I did not look hard enough).


*: Interesting footnote there:

When used in a loop, it is faster to cache the ctype<> facet and use it directly [instead of isdigit() et al, end comment], or use the vector form of ctype<>::is.

Peril answered 23/2, 2012 at 16:20 Comment(2)
Why the vote-for-close: This question is not a good fit to our Q&A format. We expect answers to generally involve facts, references, or specific expertise; this question will likely solicit opinion, debate, arguments, polling, or extended discussion. I have facts, references, specific expertise, and the answer will probably not involve solicit opinion, debate, argument, polling, but prolly a reference into the standard, so no extended discussion either? Is someone high of mod-powers?Peril
It's not in the locale stuff, because that has to deal with other digits too. (E.g. ;) )Aspectual
P
26

Indeed not looked hard enough: In 2.3. Character sets, item 3:

In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.

And this is above list of decimal digits:

0 1 2 3 4 5 6 7 8 9

Therefore, an implementation must use a character set where the decimal digits have a contiguous representation. Thus, optimizations where you rely on this property are safe; however, optimizations where you rely on the coniguity of other digits (e.g. 'a'..'z') are not portable w.r.t. to the standard (see also header <cctype>). If you do this, make sure to assert that property.

Peril answered 23/2, 2012 at 16:22 Comment(6)
Thanks @cHao for the hint. Astonishing.Peril
As it happens, both ASCII (and its derivatives) and EBCDIC assign contiguous values to the decimal digits. ASCII makes the lowercase letters contiguous, as well as the uppercase letters; EBCDIC does not. That's probably why C and C++ require consecutive digits, but not consecutive letters. The vast majority of C++ implementations use ASCII or one of its derivatives (Latin-1, Windows-1252, Unicode, etc.); the vast majority of the rest use EBCDIC.Joaniejoann
@CodingMastero: I usually wait some days to encourage more answers. Maybe someone provides some historical background besides the references :)Peril
its you who have asked and answered too. Then What more you need?Chouinard
@CodingMastero: True, but often enough, some answerers provide additional information and insight. I didn't want to discourage anyone from posting. However, the time buffer is over and I accepted.Peril
If ISO C also has the same guarantee, could you mention that in this answer? It came up when I googled for C digits contiguous. Update: it does, Why does subtracting '0' in C result in the number that the char is representing?Struble

© 2022 - 2024 — McMap. All rights reserved.