C++: conditionally inserting a key and value into a std::map
Asked Answered
I

2

10

If I do:

std::map<string, size_t> word_count;
size_t value = word_count.count("a") == 0 ? 1 : 2;
word_count["a"] = value;

then the final value of word_count["a"] is 1, as I would expect. If instead I do:

std::map<string, size_t> word_count;
word_count["a"] = word_count.count("a") == 0 ? 1 : 2;

then the final value of word_count["a"] is 2. Why!?

Invaluable answered 6/4, 2013 at 21:7 Comment(4)
Isn't word_count["a"]++ shorter/easier?Salzburg
@mfontanini, Yes, but that should have a min() if used repeatedly to limit it to 2.Scalpel
Wow, I'm really sorry that you've probably read all of those comments by now in the hopes of a completely straight answer. It must be more torturous for you than it is for us.Scalpel
@chris: :-) It does makes sense to me now. My mistake was assuming that the right side of the assignment would be evaluated before the left side.Invaluable
S
11

Officially, either side of the assignment could be evaluated first. It's up to the implementation to decide which. If word_count does not contain an "a", one is inserted, and an lvalue reference to it returned. If word_count does contain one, only the latter part happens. Despite the uncertainty of which side is evaluated first, you can follow possible executions:

Left Side First

No Element Exists:

operator[] inserts the element since it isn't already there. count() finds it and returns 1, so you end up with it being assigned a value of 2.

Element Exists:

operator[] returns the existing element and count() finds it and returns 1, so you end up with it being assigned a value of 2.

Right Side First

No Element Exists:

count() returns 0, so you get 1 from the right side. Then, "a" is inserted into the map and assigned a value of 1.

Element Exists:

count() returns 1, so you get 2 from the right side. Then, word_count["a"] is accessed and has 2 assigned to it.

Conclusion

In short, you can't rely on this to do what you want, so it's better to use something that you can rely on. mrfontanini made a good suggestion, so I'll edit it a bit:

word_count["a"]++;
word_count["a"] = std::min(word_count["a"], 2);

The first line ensures it is inserted and has a value of at least 1. The second limits that value to a maximum 2, in the case that you do this operation repeatedly.

Notes

I base this answer off of two things:

  1. When a side is picked to be evaluated, the whole side has to be evaluated before the other side can start.

  2. Constructs such as word_count["a"] = 1 exhibit well-defined behaviour, even in the case that an element is inserted and then assigned to.

There is some debate and discussion below about whether these are true or not. I've made it a bit more official now.

Scalpel answered 6/4, 2013 at 21:7 Comment(26)
And additionally word_count["a"] is being evaluated before word_count.count("a") which is possibly what is surprising the OP.Submergible
It need not be evaluated first. This is, I believe, a case of undefined behavior (reading a value and modifying it without a sequence point), though I've never seen it invoked indirectly like this.Jarboe
@BenjaminLindley, Really? I was pretty sure it was evaluated first, but that could be coming from examples that don't use the map on the right side.Scalpel
@BenjaminLindley, Looking at it again, I believe you're right. It could evaluate either side of operator= first, and which it chooses affects the outcome.Scalpel
@Scalpel It's implementation defined whether left or right hand side of an assignment is evaluated first. But as Benjamin says this is technically UB because of the modification, although in practice I think the explanation is that this compiler is evaluating the LHS first in this case.Submergible
It's really no different (only superficially) than this: (x = 0) = (x == 0) ? 1 : 2;Jarboe
Okay, I lightly patched up the post. I'm going to change it to include that.Scalpel
@stardust_: Sure that's fine. The issue is that we're writing to word_count["a"] twice without an interleaving sequence point.Indevout
@sharth, I don't think so, are we? count() is marked const, so it couldn't. It's rather like i = i++; vs i = i + 1;.Scalpel
@chris: I'm actually less sure about that not being UB as well, but you're missing that the lhs of the =, that is just word_count["a"] also writes to the map in the case that the key "a" does not exist.Indevout
@sharth, Oh, duh. That makes two writes :pScalpel
@sharth, Wait, how would word_count["a"] = 1; or something work then if it doesn't exist? It's annoying trying to get around all of these while rewriting my answer :pScalpel
@BenjaminLindley, More like (x = 0, x) = (x == 0) ? 1 : 2;, is it not? It has to insert the value and return an lvalue reference to it. That would make more sense to me, as there's no undefined behaviour from the assignment, which would make word_count["a"] = 1; work.Scalpel
Oops, I misspoke. The statement is not equivalent to (x = 0) = (x == 0) ? 1 : 2;, it is rather equivalent to this (x = 0) = (y == "a") ? 1 : 2; -- But it's still undefined behavior, because, while there is a sequence point within the ternary operator itself, the assignment (x = 0) is not sequenced relative to the ternary expression. So, while it's not reading the value of x, it is modifying it twice, which is also UB.Jarboe
@chris: Even if it's (x = 0, x), that's still UB, because even though, in that case, x = 0 is sequenced relative to x, the whole expression (x = 0, x) is not sequenced relative to the ternary expression. So again, that's modifying x twice, unsequenced.Jarboe
@BenjaminLindley, But if the left side is first, you get a sequenced assignment to x, and it's returned for later assignment. If the right side is evaluated first, it reads the current value, then you get a sequenced assignment from the left side, then both are again used in the assignment. Wouldn't that work? I don't see how there can ever be two modifications unsequenced. And I'm pretty sure word_count["a"] = 1; has to ultimately rely on behaviour like that. In that case, it just doesn't matter which side is evaluated first.Scalpel
@sharth, Can they? I thought it had to stick to the one it chose and evaluate the entire subtree. It doesn't make sense to me how it could randomly go up and down the other side.Scalpel
@chris: I'm fairly sure they can be, but I'd love for that statement to be proved wrong. It's making me wonder how map[1] + map[1] doesn't fail.Indevout
@sharth, Well, I'll finish what I'm doing with my answer. I'd really like for these fine points to be proven. I think it might constitute a new, really specific question.Scalpel
@chris: That would seem to make sense, but it is as sharth said. For example, a pointer to x may be loaded in register A for the modification to 0, then another pointer to x may be loaded to register B in order to make the assignment to the result of the ternary expression. Then, since they are unsequenced, either the assignment to 0 can happen first, or the assignment from the ternary expression can happen. It may sound random, but the compiler writers may have a good reason to do it that way. I'm not versed enough in the low level machine stuff to be sure.Jarboe
@BenjaminLindley, Well, I clearly outlined that I was relying on these two assumptions, so that's the best I have for now. I would really like for the two points to be studied with fine detail and get some other people involved. Does one (or two) questions for the two points sound like a good idea to you?Scalpel
I really think that this question covers it. However, I won't vote to close as duplicate if you think it needs more clarification.Jarboe
@BenjaminLindley, Doesn't this example from it seem to go in my favour? i = (++i,i++,i) // well defined Scalpel
Maybe, I don't know anymore. My head is near exploding, as happens every time I try to think about this stuff. I need to stop getting in to these discussions about the finer points of C++, because I'm pretty sure that most of the time, I'm wrong.Jarboe
@BenjaminLindley, I know that feeling. It's been happening to me for most of this long section of comments :( I think I might ask a new question after I eat.Scalpel
@BenjaminLindley, I ended up asking. It's here if you want to see it.Scalpel
I
5

Looking at this line:

word_count["a"] = word_count.count("a") == 0 ? 1 : 2;

I believe that if word_count["a"] does not exist in the map before that line is executed, you cause undefined behavior.

This is because word_count["a"] will create an entry in the map if one doesn't exist, and this will change the behavior of word_count.count("a"). We also have no required sequencing between those two calls..

Indevout answered 6/4, 2013 at 21:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.