Why does the C parser not allow spaces between the digits of an integer literal?
Asked Answered
R

3

0

I expect it would break some other parse but this is not immediately springing to mind...

E.g. If I say this:

#include <stdio.h>
int main()
{
    int i = 10 000;
    printf("%d\n", i);
    return 0;
}

My compiler says this:

ctest.c:5: error: expected ‘,’ or ‘;’ before numeric constant

I'm thinking that it would be a feature, i.e. it's easier to 'eye parse' large integers with thousands separators. I'm using a recent GCC and assuming this is the case for other compilers/parsers/lexers; whichever it is that objects.

Resiniferous answered 8/10, 2011 at 10:51 Comment(2)
The parser does not allow this because the standard says so.Diamagnetic
In Ada you can use underscores as in 10_000Heartbreaking
E
6

The language doesn't allow this (an integer literal is one token, the intervening whitespace splits it into two tokens) but there's typically little to no expense incurred by expressing the initializer as an expression that is a calculation of literals:

int i = 10 * 1000; /* ten thousand */
Engracia answered 8/10, 2011 at 10:57 Comment(5)
Does this compile to the same code? Or will the calculation of the multiplication be in the binary too?Efrem
@Nodody: That's implementation dependent. On gcc the constant 10000 is used in the generated assembler in both cases, even at -O0. This implies that gcc doesn't even consider this an optimization.Engracia
@CharlesBailey Your solution for 10 * 000 is good but what about 41196312? I'm not a fan of 41 * 1000 * 1000 + 196 * 1000 + 312. Would it have been considered adding needless complexity to the parser? Perhaps, also, sentiment against magic numbers in code?Resiniferous
@user710408: You're not going to get lost counting the digits in 41196312 because it's not a stream of identical zeros which was the use case that I was addressing. You could, I supposed, use token pasting if you wanted to break things up. e.g. #define MY_CONST (41 ## 196 ## 312) and int i = MY_CONST;. Incidentally, what is the origin of 41196312? I can't immediately see anything familiar about it.Engracia
@CharlesBailey First, thanks for your responses Charles, much appreciated. 41 196 312 is the genomic coordinate of a well known gene on human chromosome 7. This seems to have degenerated into a debate about whether it's helpful to group the digits in large integers. I'm surprised since I would have thought that was a given, it's certainly common practice. Since the only answer that appears forthcoming is because the standard says so I'll accept it!Resiniferous
D
2

Well it would conflict with the ability to overload the space operator that is proposed for c++.

(Yes I'm aware that's not a serious proposal).

EDIT: That was just a silly thing above, my real comment is the part below

For me the ability to put whitespace in numbers would both enhance and reduce readability. If you need to read the numbers themselves this would probably make it easier to do so. But I think it would also make the code slower to read because you'd have to mentally take one extra step parsing the code in your head rather than just looking to the end of the token. Assuming you're not that interested in the actual value of a number when reading it.

I don't believe there is any technical reason this couldn't be done, it's more that people see a number as a single indivisible token so it should be written like that. I have to say that I don't exactly know why but it would irritate me to see numbers broken up like that in code. I can't say there is a good reason for that, but it's still true :) It's probably just what I'm used to.

Daub answered 8/10, 2011 at 11:9 Comment(5)
That proposal was a C++ proposal; this question is about C. The potential conflict you suggest wouldn't be relevant.Engracia
Ah ok. Yes. It wasn't a serious proposal anyway, it was an april fool joke or something. Plus it was just a silly comment on my main point anyway.Daub
If C implemented this as some magic token concatenation, C++ would have a problem with an operator overloading mechanism. In 10 012 the second literal would be octal requiring some very special magic to have the intended effect.Engracia
But c++ space operator overloading wasn't/can't be serious can it?! So this wouldn't preclude having space separators in integers?Resiniferous
@JohnB If you are not interested in the value of the number then there's no advantage in breaking it up, clearly. If you are interested in the magnitude of the number, e.g 100s of millions or 10s of millions then it's very helpful I find.Resiniferous
P
0

You could use scientific notation to group the digits, this is safe up to 10 digits in length after that the number is not guaranteed to match the expected value.

int i = 10e3;

This helps with the example you gave in a comment 41196312 where the other answer would have had you do 41 * 1000 * 1000 + 196 * 1000 + 312 Using scientific notation you could do:

int i = 41e6 + 196e3 + 312;

I use this type of solution where i deal with engineering notation, so various frequencies, resistances and voltages can be much easier to parse when formatted in the way we would discuss them vocally. "We test it to a 10 meg resistance" so in the code i will put 10e6

Puppy answered 5/1 at 16:26 Comment(2)
No, you can not, as those are float constants: "A floating constant has a significand part that may be followed by an exponent part and a suffix that specifies its type." I use this type of solution where i deal with engineering notation, That's unfortunate, because if you actually use that in places where an exact integral value is expected, it can be wrong. Instead of an integer value of 10,000,000, you might get 9,999,998 or even 10,000,001.Isolate
@AndrewHenle It is safe in C for an integer value with no more than 10 digits. updated my answer to include that restrictionPuppy

© 2022 - 2024 — McMap. All rights reserved.