Is it more efficient to perform a range check by casting to uint instead of checking for negative values?
Asked Answered
V

7

82

I stumbled upon this piece of code in .NET's List source code:

// Following trick can reduce the range check by one
if ((uint) index >= (uint)_size) {
  ThrowHelper.ThrowArgumentOutOfRangeException();
}

Apparently this is more efficient (?) than if (index < 0 || index >= _size)

I am curious about the rationale behind the trick. Is a single branch instruction really more expensive than two conversions to uint? Or is there some other optimization going on that will make this code faster than an additional numeric comparison?

To address the elephant in the room: yes, this is micro optimization, no I do not intend to use this everywhere in my code – I'm just curious ;)

Vassili answered 30/3, 2015 at 10:10 Comment(13)
Curiosity in this case is easily satisfied with a few lines of code. Test it.Suberin
@SamAxe a test would only confirm that casts are faster (if they are), not explain why.Vassili
A peek at the resultant IL would answer the why.Suberin
There is no conversion it just compares the unsigned values.Boulder
The two "conversions" to uint are free - the same bit patterns will exist in the same register (or in memory, but if you're lucky, in a register)Lyontine
Related article(which also includes a cast from int to uint): codeproject.com/Articles/8052/…Finespun
Note that in general I would put a unchecked { ... } around the whole block, because you don't know when one of your colleague will change a project from unchecked to checked :)Guanabana
I have a related question: If this is faster, why doesn't the compiler and/or the JIT do this?Flaw
Coding in this style can easily introduce security vulnerabilities. In fact, this happens all the time in C++, and is a large part of the reason this style of coding is highly discouraged in C#. The above code only works correctly because the developers know _size can never be negative.Knockknee
I'm not familiar with C#, but why aren't the variables unsigned in the first place?Timmie
@Flaw — it's probably because neither the compiler nor the jitter know that both values are non-negative at compilation time.Offering
@WaiHaLee It should know size is non-negative, and the whole point of this is that it works even if index is negative.Flaw
@Timmie — you might want to read this SO question: "uint vs int in C#".Offering
L
60

From MS Partition I, section 12.1 (Supported data types):

The signed integer types (int8, int16, int32, int64, and native int) and their corresponding unsigned integer types (unsigned int8, unsigned int16, unsigned int32, unsigned int64, and native unsigned int) differ only in how the bits of the integer are interpreted. For those operations in which an unsigned integer is treated differently from a signed integer (e.g., in comparisons or arithmetic with overflow) there are separate instructions for treating an integer as unsigned (e.g., cgt.un and add.ovf.un).

That is, the conversion from an int to a uint is merely a matter of book-keeping - from now on, the value on the stack/in a register is now known to be an unsigned int rather than an int.

So the two conversions should be "free" once the code is JITted, and then the unsigned comparison operation can be performed.

Lyontine answered 30/3, 2015 at 10:42 Comment(11)
I'm a bit surprised that the compiler doesn't implement this optimization automatically. (Or does it?)Sauna
@IlmariKaronen How could it? It doesn't know the meaning of the values. C# and .NET are both very specifically defined (unlike, say, C++), and this is exactly the kind of "optimization" that would be pretty unsafe to do in general. Not to mention that the JIT compiler doesn't really have enough time to look for these kinds of "optimizations", and the C# compiler itself doesn't really do a whole lot of optimizations. Just write clear code unless you can prove the performance benefits (in a place where you actually care about performance).Pedant
@Luaan: Ah, yes, I see... the problem, presumably, is that the compiler isn't smart enough to know that _size cannot be negative, so it cannot safely apply the optimization (since it's only valid when (int)_size >= 0).Sauna
The two conversions are free before the code is jitted; since they're only ever used as locals the difference is between blt or blt.s and blt.un or blt.un.s, there's no need for the CIL produced from the C# to involve any actual conversion at all.Play
@IlmariKaronen Yup. And I'd rather stop coding by the time the compilers get "smart" enough to infer that I don't intend _size to be negative. That said, I'm pretty sure it will not do the optimization even if _size is uint instead of int. It just isn't worth the trouble.Pedant
@Luaan: It couldn't safely do it in that case, either, since the optimization also breaks if _size > int.MaxValue. The compiler could do the optimization if _size was a non-negative constant, or if it could infer from earlier code that the value of _size was always between 0 and int.MaxValue inclusive. And yes, modern compilers do commonly perform that kind of data flow analysis, although obviously there are limits to how much of it they can do (since the full problem is Turing-complete).Sauna
@IlmariKaronen Good point. And, well, modern compilers also get some things wrong. One of the reasons that C/C++ compilers can do so many powerful optimizations is that the languages leave a huge part of the operations undefined (e.g. C++ can only ignore null checks because what happens when dereferencing a null pointer is undefined). There are managed languages and runtimes that do a lot more hardcore optimization than .NET does, but it really isn't all that important most of the time, and it sometimes ignores some safety (e.g. sacrificing thread-safety; .NET also does that).Pedant
@IlmariKaronen The thing is, C# doesn't have a lot of optimizations to do, and the IL JIT compiler doesn't really have much time to analyze the code it's generating. Even then, it does optimizations that are impossible to do until you know exactly where you're running, including stuff like eliminating most of virtual method call overhead when most of the virtual method calls are the same. And of course, RyuJIT tries a lot harder, since it's much faster than the old JIT.Pedant
@Pedant the jitter can also ignore null checks, the main time it doesn't is ironically when an optimisation might cause something to work that should throw a NullReferenceException (and even then it can do so with a field access that will trigger the exception, rather than an explicit check). Null reference accesses are generally caught by trapping the exception from the OS rather than actually checking.Play
Interestingly I could not produce an example where casting to uint is faster. Can someone do it? Maybe it is only faster on ARM?Miscall
I have posted a question asking for a code sample that proves this optimization works - #29379337Miscall
P
29

Let's say we have:

public void TestIndex1(int index)
{
  if(index < 0 || index >= _size)
    ThrowHelper.ThrowArgumentOutOfRangeException();
}
public void TestIndex2(int index)
{
  if((uint)index >= (uint)_size)
    ThrowHelper.ThrowArgumentOutOfRangeException();
}

Let's compile these and look at ILSpy:

.method public hidebysig 
    instance void TestIndex1 (
        int32 index
    ) cil managed 
{
    IL_0000: ldarg.1
    IL_0001: ldc.i4.0
    IL_0002: blt.s IL_000d
    IL_0004: ldarg.1
    IL_0005: ldarg.0
    IL_0006: ldfld int32 TempTest.TestClass::_size
    IL_000b: bge.s IL_0012
    IL_000d: call void TempTest.ThrowHelper::ThrowArgumentOutOfRangeException()
    IL_0012: ret
}

.method public hidebysig 
    instance void TestIndex2 (
        int32 index
    ) cil managed 
{
    IL_0000: ldarg.1
    IL_0001: ldarg.0
    IL_0002: ldfld int32 TempTest.TestClass::_size
    IL_0007: blt.un.s IL_000e
    IL_0009: call void TempTest.ThrowHelper::ThrowArgumentOutOfRangeException()
    IL_000e: ret
}

It's easy to see that the second has less code, with one less branch.

Really, there's no cast at all, there's the choice of whether to use blt.s and bge.s or to use blt.s.un, where the latter treats the integers passed as unsigned while the former treats them as signed.

(Note for those not familiar with CIL, since this is a C# question with a CIL answer, bge.s, blt.s and blt.s.un are the "short" versions of bge, blt and blt.un respectively. blt pops two values off the stack and branches if the first is less than the second when considering them as signed values while blt.un pops two values of the stack and branches if the first is less than the second when considering them as unsigned values).

It's utterly a micro-opt, but there are times when micro-opt's are worth doing. Consider further, that with the rest of the code in the method body this could mean the difference between something falling inside the jitters limits for inlining or not, and if they're bothering to have a helper for throwing out of range exceptions they're probably attempting to ensure inlining happens if at all possible, and the extra 4 bytes could make all the difference.

Indeed, it's quite likely for that inlining difference to be a much bigger deal than the reduction of one branch. There aren't a lot of times when going out of your way to ensure inlining happens is worth it, but a core method of a class of such heavy use as List<T> would certainly be one of them.

Play answered 30/3, 2015 at 14:9 Comment(4)
I wish languages would include an construct to test whether a variable was within a certain range, so one wouldn't have to guess what optimizers will or won't do. If a micro-optimization like that could double the speed of a tight loop on some systems (entirely possible if it nudges an "in-lining" decision threshold), it may be very worth doing; if it won't offer any real speed-up on any systems, however, one should use the more readable form. I find it irksome when the most readable form of an expression might be comparable to the fastest form, but might not.Rowel
@Rowel I agree in general, though here it requires wider knowledge to know that because _size is already guaranteed to be greater than 0 then (index < 0 || index < _size) == ((uint)index >= (uint)_size). Of course, a compiler that could use code-contracts as part of its optimisation decision-making could certainly do something like this, but then even optimising to beat the (moving target of the) inlining limit is a peculiar case in itself in some ways.Play
@Rowel indeed, now I think of it, if C# had a contstruct like, e.g. 0 < index < _size (such as with Python, and even quite plausible for C# given it doesn't implicitly convert between boolean and integer types) then the optimisation here would still be to not use it.Play
One thing I wish languages would include would be a "(n-1)-bit 'natural number'" variable/parameter types, which would behave in arithmetic expressions like normal signed types, but with a compiler-enforced assertion that they could not be negative. Oftentimes the only way to do sane math on values of unsigned types is to use the next larger signed integer type, which can be nasty and inefficient, but it would be helpful to express the idea that a variable/parameter will only hold natural numbers.Rowel
S
9

Assuming _sizeis an integer, private to the list and index is the argument to this function which's validity needs to be tested.

Assuming further that _size is always >= 0.

Then the original test would have been:

if(index < 0 || index > size) throw exception

The optimized version

if((uint)index > (uint)_size) throw exception

has one comparison (as opsed to two int he former example.) Because the cast just reinterpretes the bits and makes the >in fact an unsigned comparison, no additonal CPU cycles are used for it.

Why does it work?

The results are simple/trivial as long as index >= 0.

If index < 0 the (uint)index will turn it into a very large number:

Example: 0xFFFF is -1 as int, but 65535 as uint, thus

(uint)-1 > (uint)x 

is always true if x was positive.

Shabbir answered 30/3, 2015 at 10:44 Comment(5)
In a checked context you get an OverflowException. In an unchecked context you cannot rely on the result: "The result of the conversion is an unspecified value of the destination type." #22757739Finespun
@Tim the quote seems to be for For a conversion from float or double to an integral type, the processing depends on the overflow checking context, so not int->uintGuanabana
@xanatos: but the rules are same, if the cast fails you get an OverflowException with checked context and T-MaxValue in unchecked context. So this returns uint.Maxvalue: unchecked { uint ui = (uint)-1; };. But it is not guaranteed. If you try that with checked you get a compiler error with the -1-constant and if you use a variable an OverflowException at runtime. Btw, i was referring to "If index < 0 the (uint)index will turn it into a very large number:...."Finespun
@TimSchmelter: Just to clarify, while unchecked (uint)-1 equals uint.MaxValue, unchecked (uint)-2 doesn't -- it equals uint.MaxValue-1, instead. Both are "very large" -- in fact, strictly larger than int.MaxValue -- though.Sauna
@TimSchmelter the meaning of this cast is indeed defined and exactly what is needed here. You're quoting an answer that is quoting the wrong part of §6.2.1. The relevant case here is mentioned a couple of paragraphs before that quote begins, giving the result of "then the source value is treated as a value of the destination type.".Play
G
9

Note that this trick won't work if your project is checked instead of unchecked. Best case it will be slower (because each cast will need to be checked against overflow) (or at least not-faster), worst case you will get an OverflowException if you try to pass -1 as the index (instead of your exception).

If you want to write it "correctly" and in a more "will surely work" way, you should put a

unchecked
{
    // test
}

all around the test.

Guanabana answered 30/3, 2015 at 11:31 Comment(3)
Checking for overflow generally doesn't slow anything down, considering how overflow checking happens on the chip. It will of course indeed throw if done in a checked context rather than the normal unchecked. This doesn't really answer the question though.Play
@JonHanna Yep... But it was too much long for a comment, and a good response was already up. And on speed: If you look at the disassembly of a cast (Release mode, so optimized code), I see a cmp dword ptr [rsp+64h],0 / jl 00000000000000A2, so a cast does explicitly a comparison if the int is < 0 then jumpGuanabana
If there is a cast, because casting int to uint and storing won't cause any exceptions at that level so a test has to be explicit, but here the difference could just be between jl and jb.Play
F
5

Yes, this is more efficient. The JIT does the same trick when range-checking array accesses.

The transformation and reasoning is as follows:

i >= 0 && i < array.Length becomes (uint)i < (uint)array.Length because array.Length <= int.MaxValue so that array.Length has the same value as (uint)array.Length. If i happens to be negative then (uint)i > int.MaxValue and the check fails.

Fitzger answered 30/3, 2015 at 20:8 Comment(5)
Can you provide example code where that does that happen because I can't build an example where one is faster than the other.Miscall
Not sure where the problem is. Simply benchmark the two versions against each other. Release mode, Ctrl-F5 (no Debugger). The benchmark should take about 1s per test so that all one-time costs and variations disappear in the noise.Fitzger
Well I have failed to get a difference trying several different approaches including the one provided by @Cullum in an answer.Miscall
The post your code to Stack Overflow as a question and leave a link here. I'll look.Fitzger
#29379337 - here you goMiscall
C
4

Apparently in real life it isn't faster. Check this: https://dotnetfiddle.net/lZKHmn

Turns out, that thanks to Intel's branch prediction and parallel execution the more obvious and readable code actually works faster...

Here's the code:

using System;
using System.Diagnostics;

public class Program
{


    const int MAX_ITERATIONS = 10000000;
    const int MAX_SIZE = 1000;


    public static void Main()
    {

            var timer = new Stopwatch();


            Random rand = new Random();
            long InRange = 0;
            long OutOfRange = 0;

            timer.Start();
            for ( int i = 0; i < MAX_ITERATIONS; i++ ) {
                var x = rand.Next( MAX_SIZE * 2 ) - MAX_SIZE;
                if ( x < 0 || x > MAX_SIZE ) {
                    OutOfRange++;
                } else {
                    InRange++;
                }
            }
            timer.Stop();

            Console.WriteLine( "Comparision 1: " + InRange + "/" + OutOfRange + ", elapsed: " + timer.ElapsedMilliseconds + "ms" );


            rand = new Random();
            InRange = 0;
            OutOfRange = 0;

            timer.Reset();
            timer.Start();
            for ( int i = 0; i < MAX_ITERATIONS; i++ ) {
                var x = rand.Next( MAX_SIZE * 2 ) - MAX_SIZE;
                if ( (uint) x > (uint) MAX_SIZE ) {
                    OutOfRange++;
                } else {
                    InRange++;
                }
            }
            timer.Stop();

            Console.WriteLine( "Comparision 2: " + InRange + "/" + OutOfRange + ", elapsed: " + timer.ElapsedMilliseconds + "ms" );

    }
}
Cullum answered 31/3, 2015 at 18:51 Comment(4)
You aren't comparing the same code as the question. See Jon Hanna's answer -- in many cases the size matters, and you have completely lost that.Maite
I don't understand. Would it matter if I make the actual check a separate function? My point was that, due to CPU branch prediction the first case performed faster. Also after playing with it a bit more it turned out that the more "out of range" values we check the better first case works, however if 99% are in-range then 2nd case seems a little bit faster. And so we have more fun, if you compile in release mode - 2nd case is faster :-)Cullum
Wait, you reported timing results without having optimization turned on?Maite
Guilty as charged. I used dotnetfiddle.com initially and it doesn't have any options about optimizations. Results surprised me. Later I tried with mono and got same results. After playing a bit more I got really interesting statistics. Jon Hanna actually made a good point. Difference could be the size and not the speed, as a few more instructions could result in calling method being inlined or not, which on it's turn could make a big difference.Cullum
D
1

While exploring this on an intel processor I found no differene in execution times, possibly due to multiple integer execution units.

But when doing this on 16MHZ real-time microprocessor with neither branch prediction nor integer execution units there were notable differences.

1 million iterations of the slower code took 1761 ms

int slower(char *a, long i)
{
  if (i < 0 || i >= 10)
    return 0;

  return a[i];
}

1 million iterations faster code took 1635 ms

int faster(char *a, long i)
{
  if ((unsigned int)i >= 10)
    return 0;
  return a[i];
}
Diastasis answered 30/5, 2015 at 6:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.