Can we write comments within variable names?
Asked Answered
U

5

149
int main()
{
     i/*nt*/a = 10;
     return 0;
}

If I have the above code and I want to count the tokens, will it be 14 or 13 tokens?

Is it valid to write a comment within a variable name? You can assume that the int i, int a, int ia are globally defined.

Usurer answered 27/8, 2020 at 4:14 Comment(9)
In pre-ANSI "traditional" C, at least as implemented by GNU cpp -traditional, it would expand to ia = 10;.Anikaanil
what an interesting question - why did it never occur to me before?Hoar
@javadba: Because sensible people wouldn't think of doing such a thing?Goulet
If you really want to do that, you can shift to Fortran. Whitespace outside strings is removed in the first parse stage.Wagtail
I was about to edit the title to ".... within variable names..." but then realized that you may have meant indeed "between". (I wanted to edit it because the answer to the original title is "Why, obviously!" The important part is "no whitespace".) Would the title "Does a comment (without surrounding whitespace) separate tokens in C?" express your actual question?Ionium
@Peter-ReinstateMonica yes "within variable names" sounds more accurate, I was not sure how to frame the question initially. Thank youUsurer
Good logical question..This might ask in interview to check whether your aware about int and comments thing..Janeth
Are comments inside variable names possible in other programming languages than C?Samarium
Nobody with a healthy mind would do such a thing btw.Stupid
E
207

The comments are removed during phase 3 of program translation1: each comment is replaced by one space character. so the comment /*nt*/ is definitely not a token.

If none of int, main, i, a or return are defined as preprocessing macros, parsing the program produces 14 tokens (not 13):

int main ( ) { i a = 10 ; return 0 ; }

Unless i is defined as a type with a typedef statement, there is a syntax error as i a does not match a rule in the C grammar.

So you cannot write comments inside variable names, the comment splits the identifier into 2 separate tokens. This is true for any preprocessing and C language token2.

Note however that you can insert comments in unusual places such as between unary operators and their operand or between the # and the preprocessing directive and its arguments:

/**/#/**/include/**/<stdio.h>/**///////////////////////
/**/#/**/define/**/STAT/**/(/**/a/**/)/**/-/**/1/**////
/**/#/**/ifdef/**/STAT/**//////////////////////////////
/**/int/**/main/**/(/**/)/**/{/**//////////////////////
/**/int/**/a/**/=/**/+/**/1/**/;/**////////////////////
/**/printf/**/(/**/"Hello "/**/"world!\n"/**/)/**/;/**/
/**/return/**/STAT/**/;/**/////////////////////////////
/**/}/**///////////////////////////////////////////////
/**/#/**/endif/**//////////////////////////////////////

But the above macro definition does not define a function-like macro but a regular macro STAT that expands to ( a ) - 1.

Variable names, like any other token can be split by escaped newlines. Escaped newlines are sequences or \ immediately followed by a newline. These sequences are removed from the source code during phase 2 of program translation. Their main purpose is to break long macro definitions on multiple lines.

Below is a code fragment3 that produces the same 14 tokens:

\
i\
nt\
 ma\
in()
{\
i/\
*nt\
*/a \
= 10;
r\
et\
urn\
 0;}

Notice how the code colorizer missed the sliced and diced keywords and comment :)


1) This behavior was specified in ANSI-C aka C89. Some ancient compilers had subtly different behavior resulting in token pasting, but such peculiarities are of historical interest only.

2) You can almost insert a comment inside a string constant by taking advantage of the fact that adjacent string constants are concatenated in phase 6 of program translation: printf("Hello "/* my name is Luca */"world!\n");

3) This Christmas Tree presentation style is not meant to be used in real programs, it illustrates how to abuse C's input handling capabilities. More elaborate tricks have won The International Obfuscated C Code Contest

Erna answered 27/8, 2020 at 6:5 Comment(0)
R
66

From a lexical point of view, a comment is the same as whitespace.

Section 6.4p3 of the C standard regarding lexical elements states:

... Preprocessing tokens can be separated by white space; this consists of comments (described later), or white-space characters (space, horizontal tab, new-line,vertical tab, and form-feed), or both. ...

More specifically, a comment is translated into a single space. This is specified in section 5.1.1.2p3:

The source file is decomposed into preprocessing tokens and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment. Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined.

To illustrate this, if you pass your code through the preprocessor, you will get:

  int main()
  {
       i a = 10;
       return 0;

  }

So comments, like whitespace, serve to separate tokens.

This means that the code will contain 14 tokens, not 13.

Romo answered 27/8, 2020 at 4:22 Comment(0)
W
25

The result will be as if you had written:

i a = 10;

NOT:

ia = 10;
Warrior answered 27/8, 2020 at 4:19 Comment(0)
C
12

See translation (a.k.a. compiling) Phase 3, step 2: "Each comment is replaced by one space character".

So, conceptually, i/*nt*/a becomes i a at that point.

Cystoid answered 27/8, 2020 at 4:25 Comment(1)
Comments are not for extended discussion; this conversation has been moved to chat.Detent
F
1

just check what form your piece of code

     int main()
    {
        int i/*nt*/a = 10;
        return 0;
    }

will have after preprocessing. Just add "-E" flag to your compiler, gcc -E myscript.c and you will get the result:

e.sharaborin@landau:~$ gcc -E myscript.c
# 1 "myscript.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 31 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 32 "<command-line>" 2
# 1 "myscript.c"
int main()
{
    int i a = 10;
    return 0;
}

And obviously, you can conclude that there is a mistake.

Foetus answered 22/9, 2020 at 6:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.