Why is my unsafe code block slower than my safe code?
Asked Answered
E

3

7

I am attempting to write some code that will expediently process video frames. I am receiving the frames as a System.Windows.Media.Imaging.WriteableBitmap. For testing purposes, I am just applying a simple threshold filter that will process a BGRA format image and assign each pixel to either be black or white based on the average of the BGR pixels.

Here is my "Safe" version:

public static void ApplyFilter(WriteableBitmap Bitmap, byte Threshold)
{
    // Let's just make this work for this format
    if (Bitmap.Format != PixelFormats.Bgr24
        && Bitmap.Format != PixelFormats.Bgr32)
    {
        return;
    }

    // Calculate the number of bytes per pixel (should be 4 for this format). 
    var bytesPerPixel = (Bitmap.Format.BitsPerPixel + 7) / 8;

    // Stride is bytes per pixel times the number of pixels.
    // Stride is the byte width of a single rectangle row.
    var stride = Bitmap.PixelWidth * bytesPerPixel;

    // Create a byte array for a the entire size of bitmap.
    var arraySize = stride * Bitmap.PixelHeight;
    var pixelArray = new byte[arraySize];

    // Copy all pixels into the array
    Bitmap.CopyPixels(pixelArray, stride, 0);

    // Loop through array and change pixels to black/white based on threshold
    for (int i = 0; i < pixelArray.Length; i += bytesPerPixel)
    {
        // i=B, i+1=G, i+2=R, i+3=A
        var brightness =
               (byte)((pixelArray[i] + pixelArray[i+1] + pixelArray[i+2]) / 3);

        var toColor = byte.MinValue; // Black

        if (brightness >= Threshold)
        {
            toColor = byte.MaxValue; // White
        }

        pixelArray[i] = toColor;
        pixelArray[i + 1] = toColor;
        pixelArray[i + 2] = toColor;
    }
    Bitmap.WritePixels(
        new Int32Rect(0, 0, Bitmap.PixelWidth, Bitmap.PixelHeight),
        pixelArray, stride, 0
    );
}

Here is what I think is a direct translation using an unsafe code block and the WriteableBitmap Back Buffer instead of the forebuffer:

public static void ApplyFilterUnsafe(WriteableBitmap Bitmap, byte Threshold)
{
    // Let's just make this work for this format
    if (Bitmap.Format != PixelFormats.Bgr24
        && Bitmap.Format != PixelFormats.Bgr32)
    {
        return;
    }

    var bytesPerPixel = (Bitmap.Format.BitsPerPixel + 7) / 8;

    Bitmap.Lock();

    unsafe
    {
        // Get a pointer to the back buffer.
        byte* pBackBuffer = (byte*)Bitmap.BackBuffer;

        for (int i = 0;
             i < Bitmap.BackBufferStride*Bitmap.PixelHeight;
             i+= bytesPerPixel)
        {
            var pCopy = pBackBuffer;
            var brightness = (byte)((*pBackBuffer
                                     + *++pBackBuffer
                                     + *++pBackBuffer) / 3);
            pBackBuffer++;

            var toColor =
                    brightness >= Threshold ? byte.MaxValue : byte.MinValue;

            *pCopy = toColor;
            *++pCopy = toColor;
            *++pCopy = toColor;                    
        }
    }

    // Bitmap.AddDirtyRect(
    //           new Int32Rect(0,0, Bitmap.PixelWidth, Bitmap.PixelHeight));
    Bitmap.Unlock();

}

This is my first foray into unsafe code blocks and pointers, so maybe the logic is not optimal.

I have tested both blocks of code on the same WriteableBitmaps using:

var threshold = Convert.ToByte(op.Result);
var copy2 = copyFrame.Clone();
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
BinaryFilter.ApplyFilterUnsafe(copyFrame, threshold);
stopWatch.Stop();

var unsafesecs = stopWatch.ElapsedMilliseconds;
stopWatch.Reset();
stopWatch.Start();
BinaryFilter.ApplyFilter(copy2, threshold);
stopWatch.Stop();
Debug.WriteLine(string.Format("Unsafe: {1}, Safe: {0}",
                stopWatch.ElapsedMilliseconds, unsafesecs));

So I am analyzing the same image. A test run of an incoming stream of video frames:

Unsafe: 110, Safe: 53
Unsafe: 136, Safe: 42
Unsafe: 106, Safe: 36
Unsafe: 95, Safe: 43
Unsafe: 98, Safe: 41
Unsafe: 88, Safe: 36
Unsafe: 129, Safe: 65
Unsafe: 100, Safe: 47
Unsafe: 112, Safe: 50
Unsafe: 91, Safe: 33
Unsafe: 118, Safe: 42
Unsafe: 103, Safe: 80
Unsafe: 104, Safe: 34
Unsafe: 101, Safe: 36
Unsafe: 154, Safe: 83
Unsafe: 134, Safe: 46
Unsafe: 113, Safe: 76
Unsafe: 117, Safe: 57
Unsafe: 90, Safe: 41
Unsafe: 156, Safe: 35

Why is my unsafe version always slower? Is it due to using the back buffer? Or am I doing something wrong?

Thanks

Earwitness answered 3/5, 2010 at 19:14 Comment(3)
Are you running these tests in a Release build?Nisi
No this is in a debugger build right now.Earwitness
I recommend running performance tests in a release build to get meaningful results. You may find there is less of a performance gap to justify using an 'unsafe' version if you let the optimizer have its way with the code.Nisi
T
9

Maybe because your unsafe version is doing a multiply and property access:

Bitmap.BackBufferStride*Bitmap.PixelHeight

On every loop iteration. Store the result in a variable.

Tautog answered 3/5, 2010 at 19:16 Comment(4)
Not sure that it's THE problem, but it would definitely contribute to it.Leesaleese
You are correct! I was so wrapped up in trying to figure out how the unsafe block worked and the pointers, that I was blind to my iterator condition logic operation. Storing that in a variable indeed dropped me down below the "safe" version, although not really enough to be a game-changer. If there are other things I am doing that are sub-optimal, I am all ears, if not, thanks again for the quick fix!Earwitness
It is not the multiply that is the major problem, it is the fact that you are accessing the properties (BackBufferStride and PixelHeight, this is the real killer. I know you are working with WPF, but for Silverlight I created a wrapper that cached all the property values of the bitmap to get the required performance.Mokpo
@Chris - You're right about that. I changed my answer to reflect this too.Tautog
F
5

One further optimization, in either safe or unsafe code: Stop dividing by 3 inside your loop. Multiply your threshold by 3 once, outside the loop. You will need to use some type other than byte, but that shouldn't be a problem. Actually, you're already using a bigger datatype than byte :)

Foredo answered 3/5, 2010 at 19:37 Comment(2)
+1: This helped too! I was getting about 26-30 ms range before and now performing the single multiply outside the loop as you suggested got it down to the 17-20 ms range. Not sure I'll be able to get down much lower.Earwitness
@jomtois I've tried a few things... but if my tests prove anything, is that micro-optimization is usually not worth it. I tried using some bitmasks and using something like while (pBackBuffer < end) instead of a counting loop. All actually decreased performance :PForedo
A
0

It's hard to say with out profiling the code, especially since the code is very different (eventhough it at first looks similar) some key points (and they are all just speculations)

the stop condition if the if is calculated in the unsafe version not in the safe

  • the indices for the array pixelArray might only be calculated once eventhough they are used twice.
  • even if they are not "cached" adding the numbers together without storing them (as opposed to ++p) would still be faster (fewer instructions and less memory access)
  • you are not locking the bitmap in the safe version
  • pixelArray[i],pixelArray[i+1],pixelArray[i+2] might get stored in locals making accessing them the second time potentially faster than iterating the pointer again.
  • you have an extra assignment in the unsafe code (pCOpy = pBackBuffer) and an extra increment (pBackBuffer++;)

That's all the ideas I can come up with. Hope it helps

Antheridium answered 3/5, 2010 at 19:37 Comment(1)
I suspect the stop condition is the big hitter, as mentioned above. I will check on some of your other ideas and @Thorarin's idea above as well. Basically I want to move the pointer along in 4 byte chunks that represent the pixel. I need to interrogate and change the first three bytes in the chunk and skip the fourth. I don't know what micro-optimizations are the best for this.Earwitness

© 2022 - 2024 — McMap. All rights reserved.