Why doesn't a+++++b work?

Asked 17/3, 2011 at 15:25 Answered 22/11, 2023 at 22:40

int main ()
{
   int a = 5,b = 2;
   printf("%d",a+++++b);
   return 0;
}

This code gives the following error:

error: lvalue required as increment operand

But if I put spaces throughout a++ + and ++b, then it works fine.

int main ()
{
   int a = 5,b = 2;
   printf("%d",a++ + ++b);
   return 0;
}

What does the error mean in the first example?

Sharice answered 17/3, 2011 at 15:25 Comment(2)

It is surprising after all this time that no one had discovered that the exact expression you are asking about is used as an example in the C99 and C11 standard. It gives a good explanation as well. I have includes that in my answer. – Bornite 25/7, 2014 at 15:1

@ShafikYaghmour — That's 'Example 2' in C11 §6.4 Lexical Elements ¶6. It says "The program fragment x+++++y is parsed as x ++ ++ + y, which violates a constraint on increment operators, even though the parse x ++ + ++ y might yield a correct expression." – Mitra 16/10, 2019 at 4:12

100

printf("%d",a+++++b); is interpreted as (a++)++ + b according to the Maximal Munch Rule^!.

++ (postfix) doesn't evaluate to an lvalue but it requires its operand to be an lvalue.

_{!
6.4/4 says
the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token"}

Bathurst answered 17/3, 2011 at 15:40 Comment(0)

184

Compilers are written in stages. The first stage is called the lexer and turns characters into a symbolic structure. So "++" becomes something like an enum SYMBOL_PLUSPLUS. Later, the parser stage turns this into an abstract syntax tree, but it can't change the symbols. You can affect the lexer by inserting spaces (which end symbols unless they are in quotes).

Normal lexers are greedy (with some exceptions), so your code is being interpreted as

a++ ++ +b

The input to the parser is a stream of symbols, so your code would be something like:

[ SYMBOL_NAME(name = "a"), 
  SYMBOL_PLUS_PLUS, 
  SYMBOL_PLUS_PLUS, 
  SYMBOL_PLUS, 
  SYMBOL_NAME(name = "b") 
]

Which the parser thinks is syntactically incorrect. (EDIT based on comments: Semantically incorrect because you cannot apply ++ to an r-value, which a++ results in)

a+++b

a++ +b

Which is ok. So are your other examples.

Hydrodynamics answered 15/4, 2011 at 13:16 Comment(11)

+1 Good explanation. I have to nitpick though: It is syntactically correct, it just has a semantic error (attempt to increment the lvalue resulting from a++). – Elvaelvah 15/4, 2011 at 13:22

a++ results in an rvalue. – Desiccator 15/4, 2011 at 13:40

In the context of lexers, the 'greedy' algorithm is usually called Maximal Munch (en.wikipedia.org/wiki/Maximal_munch). – Genvieve 15/4, 2011 at 14:18

Nice. Many languages have similar bizarre corner cases thanks to greedy lexing. Here's a really weird one where making the expression longer makes it better: In VBScript x = 10&987&&654&&321 is illegal, but bizarrely enough x = 10&987&&654&&&321 is legal. – Casefy 15/4, 2011 at 18:25

It has nothing to do with greed and all to do with order and precedence. ++ is higher then + so two ++ will be done first. +++++b will also be + ++ ++ b and not ++ ++ + b. Credit to @MByD for the link. – Agreement 15/4, 2011 at 19:38

@delnan: rvalues versus lvalues is a syntactic restriction. You can determine whether something is an rvalue from syntax alone. (In C, anyway. This probably breaks down in C++ ...) – Sunshine 15/4, 2011 at 21:28

@Eric - longer by inserting is not really surprising. Longer by appending and then becoming legal would have been surprising. – Archivolt 16/4, 2011 at 5:49

@Femaref: I don't understand why precedence has anything to do with a++ +++b. Precedence comes after the lexing stage, doesn't it? So it should be munched as a++ ++ +b, shouldn't it? – Escurial 16/4, 2011 at 23:55

And it actually is. I mixed it up with a+++ ++b. All those pluses... Sorry. – Desiccator 17/4, 2011 at 0:7

@Mehrdad You are correct. "++" is a single token, if it were two tokens, the compiler would be able to compile that code, because it would be possible to recover from that error. – Inventory 17/4, 2011 at 15:39

Would be good to reference the spec -- section 6.4 paragraph 4 covers this, and even has this code as an example! – Predicative 22/11, 2023 at 22:37

100

printf("%d",a+++++b); is interpreted as (a++)++ + b according to the Maximal Munch Rule^!.

++ (postfix) doesn't evaluate to an lvalue but it requires its operand to be an lvalue.

_{!
6.4/4 says
the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token"}

Bathurst answered 17/3, 2011 at 15:40 Comment(0)

The lexer uses what's generally called a "maximum munch" algorithm to create tokens. That means as it's reading characters in, it keeps reading characters until it encounters something that can't be part of the same token as what it already has (e.g., if it's been reading digits so what it has is a number, if it encounters an A, it knows that can't be part of the number. so it stops and leaves the A in the input buffer to use as the beginning of the next token). It then returns that token to the parser.

In this case, that means +++++ gets lexed as a ++ ++ + b. Since the first post-increment yields an rvalue, the second can't be applied to it, and the compiler gives an error.

Just FWIW, in C++ you can overload operator++ to yield an lvalue, which allows this to work. For example:

struct bad_code { 
    bad_code &operator++(int) { 
        return *this;
    }
    int operator+(bad_code const &other) { 
        return 1;
    }
};

int main() { 
    bad_code a, b;

    int c = a+++++b;
    return 0;
}

The compiles and runs (though it does nothing) with the C++ compilers I have handy (VC++, g++, Comeau).

Salchunas answered 15/4, 2011 at 14:7 Comment(4)

"e.g., if it's been reading digits so what it has is a number, if it encounters an A, it knows that can't be part of the number" 16FA is a perfectly fine hexadecimal number that contains an A. – Courante 15/4, 2011 at 19:55

@nightcracker: yes, but without a 0x at the beginning it'll still treat that as 16 followed by FA, not a single hexadecimal number. – Salchunas 15/4, 2011 at 19:58

@Jerry Coffin: You didn't say 0x wasn't part of the number. – Courante 15/4, 2011 at 19:59

@nightcracker: no, I didn't -- given that most people don't consider x a digit, it seemed quite unnecessary. – Salchunas 15/4, 2011 at 20:2

This exact example is covered in the draft C99 standard(same details in C11) section 6.4 Lexical elements paragraph 4 which in says:

If the input stream has been parsed into preprocessing tokens up to a given character, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token. [...]

which is also known as the maximal munch rule which is used in in lexical analysis to avoid ambiguities and works by taking as many elements as it can to form a valid token.

the paragraph also has two examples the second one is an exact match for you question and is as follows:

EXAMPLE 2 The program fragment x+++++y is parsed as x ++ ++ + y, which violates a constraint on increment operators, even though the parse x ++ + ++ y might yield a correct expression.

which tells us that:

a+++++b

will be parsed as:

a ++ ++ + b

which violates the constraints on post increment since the result of the first post increment is an rvalue and post increment requires an lvalue. This is covered in section 6.5.2.4 Postfix increment and decrement operators which says (emphasis mine):

The operand of the postfix increment or decrement operator shall have qualified or unqualified real or pointer type and shall be a modifiable lvalue.

and

The result of the postfix ++ operator is the value of the operand.

The book C++ Gotchas also covers this case in Gotcha #17 Maximal Munch Problems it is the same problem in C++ as well and it also gives some examples. It explains that when dealing with the following set of characters:

->*

the lexical analyzer can do one of three things:

Treat it as three tokens: -, > and *
Treat it as two tokens: -> and *
Treat it as one token: ->*

The maximal munch rule allows it to avoid these ambiguities. The author points out that it (In the C++ context):

solves many more problems than it causes, but in two common situations, it’s an annoyance.

The first example would be templates whose template arguments are also templates (which was solved in C++11), for example:

list<vector<string>> lovos; // error!
                  ^^

Which interprets the closing angle brackets as the shift operator, and so a space is required to disambiguate:

list< vector<string> > lovos;
                    ^

The second case involves default arguments for pointers, for example:

void process( const char *= 0 ); // error!
                         ^^

would be interpreted as *= assignment operator, the solution in this case is to name the parameters in the declaration.

Bornite answered 25/7, 2014 at 4:12 Comment(3)

Do you know what part of C++11 says the maximum munching rule? 2.2.3, 2.5.3 are interesting, but not as explicit as C. The >> rule is asked at: #15785996 – Skiing 9/5, 2016 at 15:28

@CiroSantilli巴拿馬文件六四事件法轮功 see this answer here – Bornite 9/5, 2016 at 16:39

Nice thanks, it's one of the sections I pointed to. I'll upvote you tomorrow when my cap wears off ;-) – Skiing 9/5, 2016 at 17:37

Your compiler desperately tries to parse a+++++b, and interprets it as (a++)++ +b. Now, the result of the post-increment (a++) is not an lvalue, i.e. it can't be post-incremented again.

Please don't ever write such code in production quality programs. Think about the poor fellow coming after you who needs to interpret your code.

Underpinnings answered 17/3, 2011 at 15:28 Comment(0)

(a++)++ +b

a++ returns the previous value, a rvalue. You can't increment this.

Tisdale answered 17/3, 2011 at 15:27 Comment(0)

~~Because it causes undefined behaviour.~~

Which one is it?

c = (a++)++ + b
c = (a) + ++(++b)
c = (a++) + (++b)

Yeah, neither you nor the compiler know it.

EDIT:

The real reason is the one as said by the others:

It gets interpreted as (a++)++ + b.

but post increment requires a lvalue (which is a variable with a name) but (a++) returns a rvalue which cannot be incremented thus leading to the error message you get.

Thx to the others to pointing this out.

Vespasian answered 15/4, 2011 at 13:11 Comment(11)

you could say the same for a+++b - (a++) + b and a + (++b) have different results. – Decarbonate 15/4, 2011 at 13:13

You are right. But its inconsistent, because in c=a+++b; its similar (it could be a++ + b or a + ++b and there does the compiler not complain. – Protractile 15/4, 2011 at 13:14

Maybe Moltar know it? Or if not Moltar, then certainly Zorak know it. – Elasticity 15/4, 2011 at 13:14

actually, postfix ++ has higher precedence than prefix ++, so a+++b is always a++ + b – Firn 15/4, 2011 at 13:16

I don't think this is the right answer, but I could be wrong. I think the lexer defines it to be a++ ++ +b which cannot be parsed. – Hydrodynamics 15/4, 2011 at 13:17

I disagree with this answer. 'undefined behaviour' is quite different from tokenization ambiguity; and I don't think the problem is either. – Carinthia 15/4, 2011 at 13:17

but operator precedence is a slightly different topic. This is about parsing the text to determine whether it is a postfix or prefix operator. Otherwise a+++++b would evaluate to ((a++)++)+b with no ambiguity. – Decarbonate 15/4, 2011 at 13:19

"Otherwise a+++++b would evaluate to ((a++)++)+b" ... my view right now is a+++++b does evaluate to (a++)++)+b. Certainly with GCC if you insert those brackets and rebuild, the error message doesn't change. – Carinthia 15/4, 2011 at 13:21

@Jim: right. This actually suggests more that this answer is wrong and the issue is not ambiguity. – Decarbonate 15/4, 2011 at 13:24

@all you are right. I have updated my post. Sorry for the confusion. – Vespasian 15/4, 2011 at 13:26

By the way, a lvalue is not necessarily a variable. A compound literal is a lvalue. – Giverin 14/12, 2012 at 18:54

I think the compiler sees it as

c = ((a++)++)+b

++ has to have as an operand a value that can be modified. a is a value that can be modified. a++ however is an 'rvalue', it cannot be modified.

By the way the error I see on GCC C is the same, but differently-worded: lvalue required as increment operand.

Carinthia answered 15/4, 2011 at 13:16 Comment(0)

Follow this precesion order

1.++ (pre increment)

2.+ -(addition or subtraction)

3."x"+ "y"add both the sequence

int a = 5,b = 2; printf("%d",a++ + ++b); //a is 5 since it is post increment b is 3 pre increment return 0; //it is 5+3=8

Gratuitous answered 3/4, 2020 at 11:49 Comment(0)

Section 6.4 paragraph 4 of the C spec actually covers precisely this case:

If the input stream has been parsed into preprocessing tokens up to a given character, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token. There is one exception to this rule: header name preprocessing tokens are recognized only within #include preprocessing directives and in implementation-defined locations within #pragma directives. In such contexts, a sequence of characters that could be either a header name or a string literal is recognized as the former.
EXAMPLE 1 The program fragment 1Ex is parsed as a preprocessing number token (one that is not a valid floating or integer constant token), even though a parse as the pair of preprocessing tokens 1 and Ex might produce a valid expression (for example, if Ex were a macro defined as +1 ). Similarly, the program fragment 1E1 is parsed as a preprocessing number (one that is a valid floating constant token), whether or not E is a macro name.
EXAMPLE 2 The program fragment x+++++y is parsed as x ++ ++ + y , which violates a constraint on increment operators, even though the parse x ++ + ++ y might yield a correct expression

Predicative answered 22/11, 2023 at 22:40 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags