In C++, is it better to cap a value using std::min or an if branch?

Asked 21/3, 2013 at 5:34 Answered 14/3, 2020 at 6:30

A very common pattern in programming is to cap a value at a maximum after some kind of update. What I'd like to know, is if there's a difference between the following two pieces of code, and if one should be preferred:

value += increment;
value = std::min(value, valueMax);

value += increment;

if (value > valueMax)
    value = valueMax;

My thinking is that this comes down to whether CPUs have instructions for taking two values and producing the minumum. If so, the call to std::min should result in this instruction and avoid an unnecessary branch. If not, the second version avoids an unnecessary assignment when value <= valueMax.

I'm not very good with this kind of thing, but I'm sure there's old-school assembly hackers who would know this. To them I ask: which is better?

Shaw answered 21/3, 2013 at 5:34 Comment(11)

Try both and look at the assembly... – Strutting 21/3, 2013 at 5:37

I'd say the first version will always perform at least as well as the second version, so there's no reason not to use it. The first version might also be faster, although there's no guarantees about that. – Shanklin 21/3, 2013 at 5:39

Like Mysticial implied, it depends on the implementation of std::min (en.cppreference.com/w/cpp/algorithm/min). – Etheleneethelin 21/3, 2013 at 5:39

@CodyGray, can you explain how first version will be faster. The first version will always have an else branch (i.e. more code size) without any compiler optimization. – Singhalese 21/3, 2013 at 5:42

There is indeed an instruction for the minimum of two words (PMINSW), but it's a SSE instruction. Who knows what (if any) compilers actually optimize to that. If you're determined to do it in one instruction and know for sure that it's done in one instruction, you'll need to drop down to assembly. – Catheycathi 21/3, 2013 at 5:43

I think this question is a bit dicey, because "better" is an imprecise term. In this case, it sounds like you mean "faster". But I wouldn't try to outsmart your compiler with premature optimization unless performance is absolutely critical -- which isn't the case 99% of the time. If it is, do as @Strutting says and the answer will be clear (for your particular compiler/architecture). – Juvenescent 21/3, 2013 at 5:44

@Mystical: I guess I hadn't really considered the possibility that the assembly might be the same. So, assuming they were different, I wouldn't be able to tell what was really different with my extremely limited knowledge of assembly. – Shaw 21/3, 2013 at 5:50

@Aditya A branch is not required. Granted, this is not a sensible implementation unless you know that the target architecture is extremely sensitive to branches, but it's possible that you're working with a version of the library that has been thus optimized. – Shanklin 21/3, 2013 at 5:52

@Mozza314 In which case, the best you could do is to (try to) post the assembly (even if you can't read it). Or put together a benchmark to see if you can see a difference. In your case, I'd say anything can happen. Not all compilers are equally smart/dumb about optimizing branches. – Strutting 21/3, 2013 at 5:55

@CodyGray, In that version where branch is not required, again, the code size is more than the second version. I believe that second version will always at least as fast as the first version and not the other way round. – Singhalese 21/3, 2013 at 5:59

@Aditya On an architecture where branches are extremely expensive and to be avoided at all costs, the additional code size is not relevant. It's still faster because it avoids branching. On other architectures where branches are less expensive, you're right, the code size is the relevant factor. The point is that the standard library function can be selectively optimized for the target architecture, where as the one you write yourself is not. – Shanklin 21/3, 2013 at 16:50

Modern compilers are smart enough to generate the same code in both cases. For example, 32-bit GCC generates:

addl    %esi, %edi
cmpl    %edx, %edi
movl    %edi, %eax
cmovgl  %edx, %eax

64-bit Clang:

%1 = add nsw i32 %increment, %value
%2 = icmp sgt i32 %1, %valueMax
%value = select i1 %2, i32 %valueMax, i32 %1

Schlep answered 21/3, 2013 at 5:44 Comment(2)

+1. Could you give some annotations to the assembly here? Is there a branch? – Shaw 21/3, 2013 at 6:5

@Mozza314 There's no branch here. The compiler did about as good as can be done - which is to use a conditional move. – Strutting 21/3, 2013 at 6:16

On VC10 on Release for the following code we have the following assembly:

int main(int argc, char *argv[])
{ 
  int dummyValue = 0, valueMax = 3000, value = valueMax + 1;

  cin >> valueMax;
  cin >> value;

  dummyValue = std::min(value, valueMax);

  cout << dummyValue;
  cin >> valueMax;
  cin >> value;

  if (value > valueMax)
    dummyValue = valueMax;

  cout << dummyValue;
  return 0;
}

Generated:

  24:   dummyValue = std::min(value, valueMax);
00E112AF  mov         eax,dword ptr [valueMax]  
00E112B2  cmp         eax,dword ptr [value]  
00E112B5  lea         edx,[value]  
00E112B8  lea         ecx,[valueMax]  
00E112BB  cmovge      ecx,edx     // <-- this is our conditional assignment
00E112BE  mov         esi,dword ptr [ecx]

and

if (value > valueMax)
  dummyValue = valueMax
00E112ED  mov         eax,dword ptr [valueMax]  
00E112F0  cmp         dword ptr [value],eax  
00E112F3  mov         ecx,dword ptr ds:[0E13038h]  
00E112F9  cmovg       esi,eax

So both cases optimized to either cmovge or cmovg instructions.

I would still go with std::min because it shows intent better than an if statement. It's optimized away and it's more readable.

Syntax answered 21/3, 2013 at 5:45 Comment(2)

When you provide constant values to variables for simple algorithms like min (which is inlined most of the time). The compiler can figure out, at compile time, the minimum value. THe loop will not be executed so you are getting 0. Try generating values passed to min at run-time and you will get proper measurement. – Singhalese 21/3, 2013 at 6:4

In any case, your test is kinda broken in several ways. 1) It does indeed do nothing. So the compiler is free to optimize out both those loops. That can be fixed by using the result of the loops. 2) The entire first loop would be optimized to dummyValue = 3001. And the entire second loop would be optimized to value = 3001. – Strutting 21/3, 2013 at 6:4

The answer depends on the type of value. The code could be effectively optimized if all operations are fully transparent to the code optimizer, which would be the case if value is a plain integer. But your code would also compile if value is a std::string, and then the second version might be potentially faster since the assignment is conditional.

Tungstic answered 21/3, 2013 at 8:24 Comment(0)

-2

This should generally be much faster than branching:

int max(int x, int y) 
    { 
        return x ^ ((x ^ y) & -(x < y));  
    } 
};

Muscolo answered 14/3, 2020 at 6:30 Comment(1)

-1 Modern compilers regularly replace simple branches with CMOV (conditional move) instructions, so while clever, this is rather complex, limited to integers, and the performance improvement is questionable. – Timofei 14/3, 2020 at 6:51

Recommended topics

Hot tags