Is atomic decrementing more expensive than incrementing?
Asked Answered
D

3

18

In his Blog Herb Sutter writes

[...] because incrementing the smart pointer reference count can usually be optimized to be the same as an ordinary increment in an optimized shared_ptr implementation — just an ordinary increment instruction, and no fences, in the generated code.

However, the decrement must be an atomic decrement or equivalent, which generates special processor memory instructions that are more expensive in themselves, and that on top of that induce memory fence restrictions on optimizing the surrounding code.

The text is about the implementation of shared_ptr and I am not sure if his remark applies only on this or is generally the case. From his formulation I gather it is generally.

But when I think about it I can only think of "more expensive decrement" when a if(counter==0) immediately follows -- which probably is the case with shared_ptr.

Therefore I wonder if the atomic operation ++counter is (usually) always faster than --counter, or just because it is used if(--counter==0)... with shared_ptr?

Duration answered 6/6, 2013 at 15:23 Comment(2)
"From his formulation I gather it is generally" - I gather the contrary, though.Koal
You may also want to take a look at the WriteRead barrier which is required for the --counter == 0, and which generally is the most expensive one, as is the only type of barrier which requires that right after writing the data it has to be reread guaranteeing synchronization between cores. For counter++ you just write without immediate requirement to get a synchronized state of this value, or other data.Achieve
V
11

I believe it's referring to the fact that the increment can be "hidden", where the "decrement and check" has to be done as one operation.

I'm not aware of any architecture where --counter (or counter--, asssuming we're talking about simple datatypes like int, char, etc) is slower than ++counter or counter++.

Veinstone answered 6/6, 2013 at 15:31 Comment(2)
What do you mean by "hidden"?Walloping
It is quite the opposite of what he is saying, though. He's explicitly saying "ordinary increment". It's not about not checking the value afterwards or any such thing. It's about using an ordinary increment (read cacheline, modify, write cacheline) which will lose references in presence of concurrency. I'm therefore strongly inclined to say that what he is saying is simply wrong. Sutter can say wrong things too, why not.Mentalist
C
16

He discusses this in more detail somewhere, I think in his atomic<> weapons presentation. Basically it's all about where the memory fences are needed in the shared_ptr use case, not any intrinsic property of atomic increments vs decrements.

The reason is that you don't really care about the exact ordering of increments with a ref counted smart pointer as long as you don't miss any but with decrements it is critical that you have memory barriers in place so that on your final decrement which triggers the delete you don't have any possibility of previous memory accesses from another thread to the object owned by the smart pointer being reordered to after the memory is freed.

Chayachayote answered 6/6, 2013 at 16:10 Comment(2)
That talk of Herb is super-duper! Thanks! +1Duration
The "as long as you don't miss any" is the important part, however. That, and the happens-before guarantee in respect of the decrement (but that's handled by the atomic decrement). He's explicitly saying "normal increment", which will miss some increments in a concurrent scenario. Maybe he meant to say something different, but it's what he said anyway.Mentalist
V
11

I believe it's referring to the fact that the increment can be "hidden", where the "decrement and check" has to be done as one operation.

I'm not aware of any architecture where --counter (or counter--, asssuming we're talking about simple datatypes like int, char, etc) is slower than ++counter or counter++.

Veinstone answered 6/6, 2013 at 15:31 Comment(2)
What do you mean by "hidden"?Walloping
It is quite the opposite of what he is saying, though. He's explicitly saying "ordinary increment". It's not about not checking the value afterwards or any such thing. It's about using an ordinary increment (read cacheline, modify, write cacheline) which will lose references in presence of concurrency. I'm therefore strongly inclined to say that what he is saying is simply wrong. Sutter can say wrong things too, why not.Mentalist
C
10

The issue Sutter is talking about is the fact that the reference count increment doesn't require any follow-up action for correctness. You're taking a non-zero reference count to another non-zero count, so no further action is required. However, the decrement requires follow-up action for correctness. The decrement takes a non-zero reference count to either a non-zero or zero reference count, and if your decrement of the reference count goes to zero, you need to perform an action --- specifically, deallocate the referenced object. This decrement-and-act dynamic requires greater consistency, both at the fence level (so the deallocation doesn't get reordered with some other read/write on another core that the CPU's memory/cache-management logic re-ordered) and at the compiler level (so the compiler doesn't reorder operations around the decrement that might cause reads/writes to be re-ordered around the potential deallocation).

So, for Sutter's described scenario, the difference in cost between increment and decrement isn't in the fundamental operations themselves, it's in the consistency constraints imposed on the actual use of the decrement (specifically, acting on the decrement itself) that don't apply to the increment.

Copepod answered 6/6, 2013 at 16:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.