True Unsafe Code Performance
Asked Answered
U

6

29

I understand unsafe code is more appropriate to access things like the Windows API and do unsafe type castings than to write more performant code, but I would like to ask you if you have ever noticed any significant performance improvement in real-world applications by using it when compared to safe c# code.

Ulaulah answered 21/3, 2011 at 7:18 Comment(5)
P/Invoke is not quite the same as unsafe ... I'm not sure the reasoning follows... Besides: have you measured to see if you are doing something useful here?Helainehelali
Neither of safe or unsafe is supposed to be more performant de-facto. The total performance depends on the algorithms you've implemented in your code.Surra
I'm not using unsafe code right now. I am just trying to understand if it is worth changing critical parts of the code to unsafe code.Ulaulah
Why do you assume that "unsafe" implies "better performance"? That's only true in a couple of specialized scenarios.Gnathonic
Before considering unsafe, make sure you know how to efficiently use C#. Avoid excessive creation of temp objects. When to use an array of structs, and gotchas to watch out for. Buffer.BlockCopy. The conditions under which JIT optimizes out array bounds-checking. (I am not an expert, just saying what comes to mind.) Google C# high performance and C# performance tips.Verbiage
H
29

Some Performance Measurements

The performance benefits are not as great as you might think.

I did some performance measurements of normal managed array access versus unsafe pointers in C#.


Results from a build run outside of Visual Studio 2010, .NET 4, using an Any CPU | Release build on the following PC specification: x64-based PC, 1 quad-core processor. Intel64 Family 6 Model 23 Stepping 10 GenuineIntel ~2833 Mhz.

Linear array access
 00:00:07.1053664 for Normal
 00:00:07.1197401 for Unsafe *(p + i)

Linear array access - with pointer increment
 00:00:07.1174493 for Normal
 00:00:10.0015947 for Unsafe (*p++)

Random array access
 00:00:42.5559436 for Normal
 00:00:40.5632554 for Unsafe

Random array access using Parallel.For(), with 4 processors
 00:00:10.6896303 for Normal
 00:00:10.1858376 for Unsafe

Note that the unsafe *(p++) idiom actually ran slower. My guess this broke a compiler optimization that was combining the loop variable and the (compiler generated) pointer access in the safe version.

Source code available on github.

Hinterland answered 10/7, 2012 at 9:13 Comment(7)
-1. At that time already a good example how NOT to measure performance as basically the trial code is too simplistic.Hsiuhsu
It is too trivial. It does not take into account the optimizations that the compiler does / Can do. As such it is unclea whether the numbers have any meaning.Hsiuhsu
Surely the point is to measure whether changing the idiom allows the compiler to optimize - for example, by removing bounds checking?Hinterland
Great answer, but maybe you can improve your code after you have had a look at this. I think it's worth to rerun your experiments: referencesource.microsoft.com/#mscorlib/system/text/…Statue
Text from comment of @martijnn2008: // read next char. The JIT optimization seems to be getting confused when // compiling "ch = *pSrc++;", so rather use "ch = *pSrc; pSrc++;" instead ch = *pSrc; pSrc++;Hinterland
@Hsiuhsu - what specifically would you do differently?Verbiage
Re "The performance benefits are not as great as you might think." This statement is over-broad, based on your tests. You have successfully shown that there isn't some dramatic gain merely by adding the unsafe keyword, and using ptr++. That is valuable info; I appreciate that. What's missing is any insight into when unsafe might be much faster - or conversely, showing that unsafe isn't faster, in more substantial calculations. However, this is a useful starting point; thanks.Verbiage
G
20

As was stated in other posts, you can use unsafe code in very specialised contexts to get a significant performance inprovement. One of those scenarios is iterating over arrays of value types. Using unsafe pointer arithmetic is much faster than using the usual pattern of for-loop/indexer..

struct Foo
{
    int a = 1;
    int b = 2;
    int c = 0;
}

Foo[] fooArray = new Foo[100000];

fixed (Foo* foo = fooArray)  // foo now points to the first element in the array...
{
    var remaining = fooArray.length;
    while (remaining-- > 0)
    {
        foo->c = foo->a + foo->b;
        foo++;  // foo now points to the next element in the array...
    }
}

The main benefit here is that we've cut out array index checking entirely..

While very performant, this kind of code is hard to handle, can be quite dangerous (unsafe), and breaks some fundamental guidelines (mutable struct). But there are certainly scenarios where this is appropriate...

Graphemics answered 21/3, 2011 at 9:3 Comment(10)
Another big drawback of this kind of code is that not a great proportion of C# programmers understand it or its implications. If you're working in a team, adopt the KISS principle...Graphemics
Are you sure that this particular example is much faster? Sun's JVM manages to make this type of code (in Java or Scala) nearly as fast as C++ with pointer arithmetic; I'm surprised that C# implementations wouldn't do the same.Transudation
This particular example, no, because the compiler can determine that i will never be outside the bounds of the array and thus skip array bounds checks. But the principle remains. In my particular case I have a ring-buffer implemented using an array, and a seperate Iterator object to iterate over it. In this case the compiler cannot make this optimization..Graphemics
Agreed on ring buffers, at least for sizes that are integer multiples of two. If it's not two, the compiler (either the real one or the just-in-time one) could conceivably observe that your test fixes the range always, and thus it doesn't need to repeat the exact same test again. I don't know whether this is done, however; haven't needed a high performance ring buffer lately.Transudation
Yeah, I'm not certain under what circumstances the C# jitter can determine that an array bounds check is unnecessary. I'm working with the .NET compact framework where the jitter is slightly more restricted on what optimizations it can make. It's an interesting subject though..Graphemics
The javac compiler has absolutely nothing to do with runtime removal of array checks, its job has completed a long tim e beforehand. Javac aka the compiler only writes class files. The JVM is a different beast however and has different rules and so on.Hulk
In my tests using fixed and pointers is actually slower (Win 7, x64). Don't assume anything, you need to measure.Loralyn
@Loralyn yes it's highly dependent on the circumstances and scenario, as I said in my answer: "in very specialised contexts". There are times when pointer arithmetic is faster. Had I claimed that unsafe code was always faster then I would have deserved your downvote. As it is, I did not make that claim and your downvote is quite unfair.Graphemics
@Graphemics don't take it too personal, I just don't want people to think this answer is going to help them, so therefore I down voted it. Not that it's a bad post, but I don't think this is a good solution. Also the statement "While very performant" is not backed with any numbers, so that kind of statement is kinda misleading.Loralyn
Hmm. This would be more useful, if backed by a code example whose performance could be measured. Nevertheless, I do appreciate the heads up about incrementing through arrays of structs - I do a lot of that in prepping Direct3D vertex buffers. If one of those becomes a bottleneck, I know what to try next. [Though based on comments here,JIT may already be doing decent optimization for me.]Verbiage
M
15

A good example is image manipulations. Modifying the Pixels by using a pointer to their bytes (which requires unsafe code) is quite a bit faster.

Example: http://www.gutgames.com/post/Using-Unsafe-Code-for-Faster-Image-Manipulation.aspx

That being said, for most scenarios, the difference wouldn't be as noticeable. So before you use unsafe code, profile your application to see where the performance bottlenecks are and test whether unsafe code is really the solution to make it faster.

Macneil answered 21/3, 2011 at 7:23 Comment(9)
Safe image manipulation can be quite fast too. Especially if your algorithm can be written to eliminate boundschecks. I only use unsafe code to copy the data from the bitmap into an array and back. And if you use a byte[] you can even avoid that and just use Marshal functions.Dartmoor
@Botz3000: That's a great link (looking forward to reading the rest of the site) but the conclusion is unsound. Using GetPixel and SetPixel is really slow but using int[] is more or less as fast as using pointers without the drawbacks.Hinterland
@CodeInChaos: I have to agree. the conclusion I came to is that the only area to benefit is copying the bitmap data. Haven't tried using Marshal yet though.Hinterland
I see so many problems with the code in that article. Iterating over y in the inner loop means that memory is not accessed sequentially. Also accessing primtives such as height, from C# properties instead of copying that primitive into a local variable first, means a lot of expensive heap fetch operations are happening. Thirdly, using a function call to GetPixel instead of inlining the pointer arithmetic in the loop involves a lot of overhead. You're looking at at roughly a 100-fold performance penalty with the code in that article. It is a terrible example of how to use unsafe code.Kami
@Nuzzolilo The example is not meant to be perfect. The article itself states that it could be optimized, but even without the optimizations it's faster. How was the performance with your optimizations?Macneil
@Macneil Like I said above, you're looking at about a 100-fold performance penalty (based on testing I've done myself). Of course it's going to be faster than using Graphics.GetPixel, but my I7 is also faster than the commodore I have in the garage. An article about using unsafe code for faster computation should not get so many things wrong IMO.Kami
The things I mentioned are not even what I would call optimizations either, they are bug fixes. Optimizations would be getting the most out of your target hardware by taking advantage of optimal CPU caching and struct alignment and so on...Kami
These numbers vary by hardware and all kinds of things of course but I was able to read and commit 10M pixels to/from the heap on the order of 1-3 milliseconds in a single thread on a modern CPU, using C#, compiled in x86 mode. Accessing Bitmap.Width in the inner loop slowed that down to 70ms for me. switching X and Y in the for loops added a handful of millisecondsKami
@Nuzzolilo Sure it can be faster, just like the author of the article said, it's a rather simple update. And optimizations are not limited to hardware specific optimization, eliminating redundant method calls and caching results is optimization as well.Macneil
M
1

Well, I would suggest to read this blog-post: MSDN blogs: Array Bounds Check Elimination in the CLR

This clarifies how bounds-checks are done in C#. Moreover, Thomas Bratts tests seem to me useless (looking at the code) since the JIT removes in his 'save' loops the bound-checks anyway.

Mistral answered 29/9, 2012 at 18:22 Comment(2)
Can you summarise the post here. If the link ever goes dark your answer isn't going to be very helpful.Catalogue
You've missed the point of the tests - they aren't to show the effect of bounds checking vs. unsafe. They are to show that bounds checking is often optimized away and that the cost of unsafe is probably not worth it in these cases.Hinterland
B
1

I am using unsafe code for video manipulation code. In such code you want it to run as fast as possible without internal checks on values etc. Without unsafe attributes my could would not be able to keep up with the video stream at 30fps or 60 fps. (depending on used camera).

But because of speed its widely used by people who code graphics.

Boutique answered 5/2, 2016 at 14:4 Comment(0)
C
1

To all that are looking at these answers I would like to point out that even though the answers are excellent, a lot has changed sins the answers have been posted.

Please note that .net has changed quite a bit and one now also has the possibility to access new data types like vectors, Span, ReadOnlySpan as well as hardware specific libraries and classes like those found in System.Runtime.Intrinsics in core 3.0

Have a look at this blog post to see how hardware optimized loops could be used and this blog how to fallback to safe methods if optimal hardware is not available.

Cagey answered 11/10, 2019 at 11:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.