C# performance - Using unsafe pointers instead of IntPtr and Marshal

Asked 9/7, 2013 at 13:14 Answered 27/10, 2018 at 6:50

Question

I'm porting a C application into C#. The C app calls lots of functions from a 3rd-party DLL, so I wrote P/Invoke wrappers for these functions in C#. Some of these C functions allocate data which I have to use in the C# app, so I used IntPtr's, Marshal.PtrToStructure and Marshal.Copy to copy the native data (arrays and structures) into managed variables.

Unfortunately, the C# app proved to be much slower than the C version. A quick performance analysis showed that the above mentioned marshaling-based data copying is the bottleneck. I'm considering to speed up the C# code by rewriting it to use pointers instead. Since I don't have experience with unsafe code and pointers in C#, I need expert opinion regarding the following questions:

What are the drawbacks of using unsafe code and pointers instead of IntPtr and Marshaling? For example, is it more unsafe (pun intended) in any way? People seem to prefer marshaling, but I don't know why.
Is using pointers for P/Invoking really faster than using marshaling? How much speedup can be expected approximately? I couldn't find any benchmark tests for this.

Example code

To make the situation more clear, I hacked together a small example code (the real code is much more complex). I hope this example shows what I mean when I'm talking about "unsafe code and pointers" vs. "IntPtr and Marshal".

C library (DLL)

MyLib.h

#ifndef _MY_LIB_H_
#define _MY_LIB_H_

struct MyData 
{
  int length;
  unsigned char* bytes;
};

__declspec(dllexport) void CreateMyData(struct MyData** myData, int length);
__declspec(dllexport) void DestroyMyData(struct MyData* myData);

#endif // _MY_LIB_H_

MyLib.c

#include <stdlib.h>
#include "MyLib.h"

void CreateMyData(struct MyData** myData, int length)
{
  int i;

  *myData = (struct MyData*)malloc(sizeof(struct MyData));
  if (*myData != NULL)
  {
    (*myData)->length = length;
    (*myData)->bytes = (unsigned char*)malloc(length * sizeof(char));
    if ((*myData)->bytes != NULL)
      for (i = 0; i < length; ++i)
        (*myData)->bytes[i] = (unsigned char)(i % 256);
  }
}

void DestroyMyData(struct MyData* myData)
{
  if (myData != NULL)
  {
    if (myData->bytes != NULL)
      free(myData->bytes);
    free(myData);
  }
}

C application

Main.c

#include <stdio.h>
#include "MyLib.h"

void main()
{
  struct MyData* myData = NULL;
  int length = 100 * 1024 * 1024;

  printf("=== C++ test ===\n");
  CreateMyData(&myData, length);
  if (myData != NULL)
  {
    printf("Length: %d\n", myData->length);
    if (myData->bytes != NULL)
      printf("First: %d, last: %d\n", myData->bytes[0], myData->bytes[myData->length - 1]);
    else
      printf("myData->bytes is NULL");
  }
  else
    printf("myData is NULL\n");
  DestroyMyData(myData);
  getchar();
}

C# application, which uses `IntPtr` and `Marshal`

Program.cs

using System;
using System.Runtime.InteropServices;

public static class Program
{
  [StructLayout(LayoutKind.Sequential)]
  private struct MyData
  {
    public int Length;
    public IntPtr Bytes;
  }

  [DllImport("MyLib.dll")]
  private static extern void CreateMyData(out IntPtr myData, int length);

  [DllImport("MyLib.dll")]
  private static extern void DestroyMyData(IntPtr myData);

  public static void Main()
  {
    Console.WriteLine("=== C# test, using IntPtr and Marshal ===");
    int length = 100 * 1024 * 1024;
    IntPtr myData1;
    CreateMyData(out myData1, length);
    if (myData1 != IntPtr.Zero)
    {
      MyData myData2 = (MyData)Marshal.PtrToStructure(myData1, typeof(MyData));
      Console.WriteLine("Length: {0}", myData2.Length);
      if (myData2.Bytes != IntPtr.Zero)
      {
        byte[] bytes = new byte[myData2.Length];
        Marshal.Copy(myData2.Bytes, bytes, 0, myData2.Length);
        Console.WriteLine("First: {0}, last: {1}", bytes[0], bytes[myData2.Length - 1]);
      }
      else
        Console.WriteLine("myData.Bytes is IntPtr.Zero");
    }
    else
      Console.WriteLine("myData is IntPtr.Zero");
    DestroyMyData(myData1);
    Console.ReadKey(true);
  }
}

C# application, which uses `unsafe` code and pointers

Program.cs

using System;
using System.Runtime.InteropServices;

public static class Program
{
  [StructLayout(LayoutKind.Sequential)]
  private unsafe struct MyData
  {
    public int Length;
    public byte* Bytes;
  }

  [DllImport("MyLib.dll")]
  private unsafe static extern void CreateMyData(out MyData* myData, int length);

  [DllImport("MyLib.dll")]
  private unsafe static extern void DestroyMyData(MyData* myData);

  public unsafe static void Main()
  {
    Console.WriteLine("=== C# test, using unsafe code ===");
    int length = 100 * 1024 * 1024;
    MyData* myData;
    CreateMyData(out myData, length);
    if (myData != null)
    {
      Console.WriteLine("Length: {0}", myData->Length);
      if (myData->Bytes != null)
        Console.WriteLine("First: {0}, last: {1}", myData->Bytes[0], myData->Bytes[myData->Length - 1]);
      else
        Console.WriteLine("myData.Bytes is null");
    }
    else
      Console.WriteLine("myData is null");
    DestroyMyData(myData);
    Console.ReadKey(true);
  }
}

Grovel answered 9/7, 2013 at 13:14 Comment(12)

Well, you could start by benchmarking those examples you've whipped up. – Nur 9/7, 2013 at 13:17

c++/CLI was designed for this sort of problem. You may want to check it out. en.wikipedia.org/wiki/C%2B%2B/CLI . At the least you can wrap your C code with c++/CLI and compile them into assemblies. You can even wrap assembly code. Then you can call those assemblies from C# like any other managed assembly. As for performance, I am not certain if it will be faster, but you can perform a test. C++/CLI comes with the C++ visual studio express. – Connected 9/7, 2013 at 13:31

Sure, your first sample copies a hundred megabytes, your second doesn't. Ought to be noticeable. What exactly is the point of the question? – Abnormity 9/7, 2013 at 13:31

@BobBlogge Thanks Bob, C++/CLI sounds like a good idea. Have you ever wrapped assembly code in C++/CLI? – Grovel 9/7, 2013 at 13:53

@delnan Yeah, maybe. The problem is that before investing into a huge rewrite, I have to be sure this "unsafe-code based P/Invoking" is a good idea according to people who have real-world experience with it, and who can tell me the pros and cons from the points of view of safety and performance. Since I have no experience with unsafe code, I believe experts more than an ad hoc benchmark run by myself. – Grovel 9/7, 2013 at 14:0

@Grovel To wrap native assembly in c++/cli you need to first wrap that in an unmanaged function. You can then call the unmanaged function from your managed function. You then compile the managed function as part of an assembly then call that from c# as you would any other managed assembly. – Connected 9/7, 2013 at 14:11

@HansPassant I need help because I'm a bit confused. People seem to prefer using marshaling to make native data available from managed code, but if there is no real difference, why doesn't everyone use the pointer-based approach? Where is the catch? For example, some people treat unsafe struct pointers as if they could not be touched: codeproject.com/Articles/339290/… General performance tests of unsafe code are also controversial: #5375315 Etc. – Grovel 9/7, 2013 at 14:16

You are solving the wrong problem using c#. Keep using c or c++ if you want the best performance. But you may find that the performance difference does not really matter in the entire system. You may be optimizing a small part of the system that is not the real bottle neck - a typical problem that developers fall in to. – Nabonidus 9/7, 2013 at 15:2

@Nabonidus (1) This rewrite wasn't my decision. (2) This part of the code must have a prescribed speed. – Grovel 9/7, 2013 at 15:6

What are your timing differences? unsafe code is fine to use, just requires different execution permissions. Full trust. It's easy to screw up when directly manipulating memory, which is why people shy away. Normally, the benefit is marginal. – Leaven 9/7, 2013 at 15:39

@Nabonidus I disagree with the idea of telling people to "just use C" to get the best performance. While C allows some extra optimization tricks, the best optimizations available to C# can still allow you to outperform equivalent C# code. With the best optimizations available to each language (sans hand-crafted assembler), performance will be roughly the same. A developer who is poor at optimization skills will have an inefficient program whether its written in C# or C. Obviously if one is not identifying correct bottlenecks, they are in this category. – Herbalist 3/6, 2016 at 0:30

@Nuzzolilo perhaps related to your point: the performance problems in a managed language such as C# aren't the speed of loops and array indexing. Rather, it is the many temporary objects that may get created by code that looks perfectly reasonable. unsafe won't help this. And I agree that almost always the solution isn't some magic technique. Performance analysis, followed by finding a better algorithm, or more appropriate approach, for your specific needs. Not shaving microseconds via low-level techniques. – Substantial 3/3, 2018 at 2:25

It's a little old thread, but I recently made excessive performance tests with marshaling in C#. I need to unmarshal lots of data from a serial port over many days. It was important to me to have no memory leaks (because the smallest leak will get significant after a couple of million calls) and I also made a lot of statistical performance (time used) tests with very big structs (>10kb) just for the sake of it (an no, you should never have a 10kb struct :-) )

I tested the following three unmarshalling strategies (I also tested the marshalling). In nearly all cases the first one (MarshalMatters) outperformed the other two. Marshal.Copy was always slowest by far, the other two were mostly very close together in the race.

Using unsafe code can pose a significant security risk.

First:

public class MarshalMatters
{
    public static T ReadUsingMarshalUnsafe<T>(byte[] data) where T : struct
    {
        unsafe
        {
            fixed (byte* p = &data[0])
            {
                return (T)Marshal.PtrToStructure(new IntPtr(p), typeof(T));
            }
        }
    }

    public unsafe static byte[] WriteUsingMarshalUnsafe<selectedT>(selectedT structure) where selectedT : struct
    {
        byte[] byteArray = new byte[Marshal.SizeOf(structure)];
        fixed (byte* byteArrayPtr = byteArray)
        {
            Marshal.StructureToPtr(structure, (IntPtr)byteArrayPtr, true);
        }
        return byteArray;
    }
}

Second:

public class Adam_Robinson
{

    private static T BytesToStruct<T>(byte[] rawData) where T : struct
    {
        T result = default(T);
        GCHandle handle = GCHandle.Alloc(rawData, GCHandleType.Pinned);
        try
        {
            IntPtr rawDataPtr = handle.AddrOfPinnedObject();
            result = (T)Marshal.PtrToStructure(rawDataPtr, typeof(T));
        }
        finally
        {
            handle.Free();
        }
        return result;
    }

    /// <summary>
    /// no Copy. no unsafe. Gets a GCHandle to the memory via Alloc
    /// </summary>
    /// <typeparam name="selectedT"></typeparam>
    /// <param name="structure"></param>
    /// <returns></returns>
    public static byte[] StructToBytes<T>(T structure) where T : struct
    {
        int size = Marshal.SizeOf(structure);
        byte[] rawData = new byte[size];
        GCHandle handle = GCHandle.Alloc(rawData, GCHandleType.Pinned);
        try
        {
            IntPtr rawDataPtr = handle.AddrOfPinnedObject();
            Marshal.StructureToPtr(structure, rawDataPtr, false);
        }
        finally
        {
            handle.Free();
        }
        return rawData;
    }
}

Third:

/// <summary>
/// https://mcmap.net/q/303227/-marshal-ptrtostructure-and-back-again-and-generic-solution-for-endianness-swapping
/// </summary>
public class DanB
{
    /// <summary>
    /// uses Marshal.Copy! Not run in unsafe. Uses AllocHGlobal to get new memory and copies.
    /// </summary>
    public static byte[] GetBytes<T>(T structure) where T : struct
    {
        var size = Marshal.SizeOf(structure); //or Marshal.SizeOf<selectedT>(); in .net 4.5.1
        byte[] rawData = new byte[size];
        IntPtr ptr = Marshal.AllocHGlobal(size);

        Marshal.StructureToPtr(structure, ptr, true);
        Marshal.Copy(ptr, rawData, 0, size);
        Marshal.FreeHGlobal(ptr);
        return rawData;
    }

    public static T FromBytes<T>(byte[] bytes) where T : struct
    {
        var structure = new T();
        int size = Marshal.SizeOf(structure);  //or Marshal.SizeOf<selectedT>(); in .net 4.5.1
        IntPtr ptr = Marshal.AllocHGlobal(size);

        Marshal.Copy(bytes, 0, ptr, size);

        structure = (T)Marshal.PtrToStructure(ptr, structure.GetType());
        Marshal.FreeHGlobal(ptr);

        return structure;
    }
}

Aerophagia answered 23/4, 2015 at 23:45 Comment(2)

I would suppose that the slowness of the third case was due to AllocHGlobal (GlobalAlloc in native code), which has a higher per-call overhead. – Mcatee 20/3, 2017 at 20:25

I recently had a similar situation again and used Span and ReadOnlySpan. Might update this with some extra code. – Aerophagia 14/10, 2022 at 14:10

Considerations in Interoperability explains why and when Marshaling is required and at what cost. Quote:

Marshaling occurs when a caller and a callee cannot operate on the same instance of data.

repeated marshaling can negatively affect the performance of your application.

Therefore, answering your question if

... using pointers for P/Invoking really faster than using marshaling ...

first ask yourself a question if the managed code is able to operate on the unmanaged method return value instance. If the answer is yes then Marshaling and the associated performance cost is not required. The approximate time saving would be O(n) function where n of the size of the marshalled instance. In addition, not keeping both managed and unmanaged blocks of data in memory at the same time for the duration of the method (in "IntPtr and Marshal" example) eliminates additional overhead and the memory pressure.

What are the drawbacks of using unsafe code and pointers ...

The drawback is the risk associated with accessing the memory directly through pointers. There is nothing less safe to it than using pointers in C or C++. Use it if needed and makes sense. More details are here.

There is one "safety" concern with the presented examples: releasing of allocated unmanaged memory is not guaranteed after the managed code errors. The best practice is to

CreateMyData(out myData1, length);

if(myData1!=IntPtr.Zero) {
    try {
        // -> use myData1
        ...
        // <-
    }
    finally {
        DestroyMyData(myData1);
    }
}

Yawmeter answered 10/1, 2018 at 13:59 Comment(1)

Your answer is pretty good. I'd like to add two notes: 1) Most of winapi use INVALID_HANDLE_VALUE instead of NULL to indicate that the handle is invalid for use. 2) It'd be better if some explanations are added about the drawbacks regard untrusted environment. – Estus 13/1, 2018 at 7:58

For anyone still reading,

Something I don't think I saw in any of the answers, - unsafe code does present something of a security risk. It's not a huge risk, it would be something quite challenging to exploit. However, if like me you work in a PCI compliant organization, unsafe code is disallowed by policy for this reason.

Managed code is normally very secure because the CLR takes care of memory location and allocation, preventing you from accessing or writing any memory you're not supposed to.

When you use the unsafe keyword and compile with '/unsafe' and use pointers, you bypass these checks and create the potential for someone to use your application to gain some level of unauthorized access to the machine it is running on. Using something like a buffer-overrun attack, your code could be tricked into writing instructions into an area of memory that might then be accessed by the program counter (i.e. code injection), or just crash the machine.

Many years ago, SQL server actually fell prey to malicious code delivered in a TDS packet that was far longer than it was supposed to be. The method reading the packet didn't check the length and continued to write the contents past the reserved address space. The extra length and content were carefully crafted such that it wrote an entire program into memory - at the address of the next method. The attacker then had their own code being executed by the SQL server within a context that had the highest level of access. It didn't even need to break the encryption as the vulnerability was below this point in the transport layer stack.

Zingale answered 8/2, 2018 at 4:25 Comment(1)

Totally an issue in some cases, thanks for pointing this out! – Aerophagia 3/3, 2018 at 19:10

Just wanted to add my experience to this old thread: We used Marshaling in sound recording software - we received real time sound data from mixer into native buffers and marshaled it to byte[]. That was real performance killer. We were forced to move to unsafe structs as the only way to complete the task.

In case you don't have large native structs and don't mind that all data is filled twice - Marshaling is more elegant and much, much safer approach.

Fetiparous answered 12/2, 2015 at 16:38 Comment(0)

Two answers,

Unsafe code means it is not managed by the CLR. You need to take care of resources it uses.
You cannot scale the performance because there are so many factors effecting it. But definitely using pointers will be much faster.

Crimea answered 15/7, 2013 at 6:18 Comment(2)

"You cannot scale the performance" - What do you mean? I don't understand. – Grovel 15/7, 2013 at 8:40

@Grovel Maybe when you use it a lot that you can't manage it all to be as fast as a few. – Winniewinnifred 14/6, 2017 at 12:31

Because you stated that your code calls to 3rd-party DLL, I think the unsafe code is more suited in you scenario. You ran into a particular situation of wapping variable-length array in a struct; I know, I know this kind of usage occurs all the time, but it's not always the case after all. You might want to have a look of some questions about this, for example:

How do I marshal a struct that contains a variable-sized array to C#?

If .. I say if .. you can modify the third party libraries a bit for this particular case, then you might consider the following usage:

using System.Runtime.InteropServices;

public static class Program { /*
    [StructLayout(LayoutKind.Sequential)]
    private struct MyData {
        public int Length;
        public byte[] Bytes;
    } */

    [DllImport("MyLib.dll")]
    // __declspec(dllexport) void WINAPI CreateMyDataAlt(BYTE bytes[], int length);
    private static extern void CreateMyDataAlt(byte[] myData, ref int length);

    /* 
    [DllImport("MyLib.dll")]
    private static extern void DestroyMyData(byte[] myData); */

    public static void Main() {
        Console.WriteLine("=== C# test, using IntPtr and Marshal ===");
        int length = 100*1024*1024;
        var myData1 = new byte[length];
        CreateMyDataAlt(myData1, ref length);

        if(0!=length) {
            // MyData myData2 = (MyData)Marshal.PtrToStructure(myData1, typeof(MyData));

            Console.WriteLine("Length: {0}", length);

            /*
            if(myData2.Bytes!=IntPtr.Zero) {
                byte[] bytes = new byte[myData2.Length];
                Marshal.Copy(myData2.Bytes, bytes, 0, myData2.Length); */
            Console.WriteLine("First: {0}, last: {1}", myData1[0], myData1[length-1]); /*
            }
            else {
                Console.WriteLine("myData.Bytes is IntPtr.Zero");
            } */
        }
        else {
            Console.WriteLine("myData is empty");
        }

        // DestroyMyData(myData1);
        Console.ReadKey(true);
    }
}

As you can see much of your original marshalling code is commented out, and declared a CreateMyDataAlt(byte[], ref int) for a coresponding modified external unmanaged function CreateMyDataAlt(BYTE [], int). Some of the data copy and pointer check turns to be unnecessary, that says, the code can be even simpler and probably runs faster.

So, what's so different with the modification? The byte array is now marshalled directly without warpping in a struct and passed to the unmanaged side. You don't allocate the memory within the unmanaged code, rather, just filling data to it(implementation details omitted); and after the call, the data needed is provided to the managed side. If you want to present that the data is not filled and should not be used, you can simply set length to zero to tell the managed side. Because the byte array is allocated within the managed side, it'll be collected sometime, you don't have to take care of that.

Estus answered 5/1, 2018 at 20:11 Comment(0)

I had the same question today and I was looking for some concrete measurement values, but I couldn't find any. So I wrote my own tests.

The test is copying pixel data of a 10k x 10k RGB image. The image data is 300 MB (3*10^9 bytes). Some methods copy this data 10 times, others are faster and therefore copy it 100 times. The used copying methods include

array access via byte pointer
Marshal.Copy(): a) 1 * 300 MB, b) 1e9 * 3 bytes
Buffer.BlockCopy(): a) 1 * 300 MB, b) 1e9 * 3 bytes

Test environment:
CPU: Intel Core i7-3630QM @ 2.40 GHz
OS: Win 7 Pro x64 SP1
Visual Studio 2015.3, code is C++/CLI, targeted .net version is 4.5.2, compiled for Debug.

Test results:
The CPU load is 100% for 1 core at all methods (equals 12.5% total CPU load).
Comparison of speed and execution time:

method                        speed   exec.time
Marshal.Copy (1*300MB)      100   %        100%
Buffer.BlockCopy (1*300MB)   98   %        102%
Pointer                       4.4 %       2280%
Buffer.BlockCopy (1e9*3B)     1.4 %       7120%
Marshal.Copy (1e9*3B)         0.95%      10600%

Execution times and calculated average throughput written as comments in the code below.

//------------------------------------------------------------------------------
static void CopyIntoBitmap_Pointer (array<unsigned char>^ i_aui8ImageData,
                                    BitmapData^ i_ptrBitmap,
                                    int i_iBytesPerPixel)
{
  char* scan0 = (char*)(i_ptrBitmap->Scan0.ToPointer ());

  int ixCnt = 0;
  for (int ixRow = 0; ixRow < i_ptrBitmap->Height; ixRow++)
  {
    for (int ixCol = 0; ixCol < i_ptrBitmap->Width; ixCol++)
    {
      char* pPixel = scan0 + ixRow * i_ptrBitmap->Stride + ixCol * 3;
      pPixel[0] = i_aui8ImageData[ixCnt++];
      pPixel[1] = i_aui8ImageData[ixCnt++];
      pPixel[2] = i_aui8ImageData[ixCnt++];
    }
  }
}

//------------------------------------------------------------------------------
static void CopyIntoBitmap_MarshallLarge (array<unsigned char>^ i_aui8ImageData,
                                          BitmapData^ i_ptrBitmap)
{
  IntPtr ptrScan0 = i_ptrBitmap->Scan0;
  Marshal::Copy (i_aui8ImageData, 0, ptrScan0, i_aui8ImageData->Length);
}

//------------------------------------------------------------------------------
static void CopyIntoBitmap_MarshalSmall (array<unsigned char>^ i_aui8ImageData,
                                         BitmapData^ i_ptrBitmap,
                                         int i_iBytesPerPixel)
{
  int ixCnt = 0;
  for (int ixRow = 0; ixRow < i_ptrBitmap->Height; ixRow++)
  {
    for (int ixCol = 0; ixCol < i_ptrBitmap->Width; ixCol++)
    {
      IntPtr ptrScan0 = IntPtr::Add (i_ptrBitmap->Scan0, i_iBytesPerPixel);
      Marshal::Copy (i_aui8ImageData, ixCnt, ptrScan0, i_iBytesPerPixel);
      ixCnt += i_iBytesPerPixel;
    }
  }
}

//------------------------------------------------------------------------------
void main ()
{
  int iWidth = 10000;
  int iHeight = 10000;
  int iBytesPerPixel = 3;
  Bitmap^ oBitmap = gcnew Bitmap (iWidth, iHeight, PixelFormat::Format24bppRgb);
  BitmapData^ oBitmapData = oBitmap->LockBits (Rectangle (0, 0, iWidth, iHeight), ImageLockMode::WriteOnly, oBitmap->PixelFormat);
  array<unsigned char>^ aui8ImageData = gcnew array<unsigned char> (iWidth * iHeight * iBytesPerPixel);
  int ixCnt = 0;
  for (int ixRow = 0; ixRow < iHeight; ixRow++)
  {
    for (int ixCol = 0; ixCol < iWidth; ixCol++)
    {
      aui8ImageData[ixCnt++] = ixRow * 250 / iHeight;
      aui8ImageData[ixCnt++] = ixCol * 250 / iWidth;
      aui8ImageData[ixCnt++] = ixCol;
    }
  }

  //========== Pointer ==========
  // ~ 8.97 sec for 10k * 10k * 3 * 10 exec, ~ 334 MB/s
  int iExec = 10;
  DateTime dtStart = DateTime::Now;
  for (int ixExec = 0; ixExec < iExec; ixExec++)
  {
    CopyIntoBitmap_Pointer (aui8ImageData, oBitmapData, iBytesPerPixel);
  }
  TimeSpan tsDuration = DateTime::Now - dtStart;
  Console::WriteLine (tsDuration + "  " + ((double)aui8ImageData->Length * iExec / tsDuration.TotalSeconds / 1e6));

  //========== Marshal.Copy, 1 large block ==========
  // 3.94 sec for 10k * 10k * 3 * 100 exec, ~ 7617 MB/s
  iExec = 100;
  dtStart = DateTime::Now;
  for (int ixExec = 0; ixExec < iExec; ixExec++)
  {
    CopyIntoBitmap_MarshallLarge (aui8ImageData, oBitmapData);
  }
  tsDuration = DateTime::Now - dtStart;
  Console::WriteLine (tsDuration + "  " + ((double)aui8ImageData->Length * iExec / tsDuration.TotalSeconds / 1e6));

  //========== Marshal.Copy, many small 3-byte blocks ==========
  // 41.7 sec for 10k * 10k * 3 * 10 exec, ~ 72 MB/s
  iExec = 10;
  dtStart = DateTime::Now;
  for (int ixExec = 0; ixExec < iExec; ixExec++)
  {
    CopyIntoBitmap_MarshalSmall (aui8ImageData, oBitmapData, iBytesPerPixel);
  }
  tsDuration = DateTime::Now - dtStart;
  Console::WriteLine (tsDuration + "  " + ((double)aui8ImageData->Length * iExec / tsDuration.TotalSeconds / 1e6));

  //========== Buffer.BlockCopy, 1 large block ==========
  // 4.02 sec for 10k * 10k * 3 * 100 exec, ~ 7467 MB/s
  iExec = 100;
  array<unsigned char>^ aui8Buffer = gcnew array<unsigned char> (aui8ImageData->Length);
  dtStart = DateTime::Now;
  for (int ixExec = 0; ixExec < iExec; ixExec++)
  {
    Buffer::BlockCopy (aui8ImageData, 0, aui8Buffer, 0, aui8ImageData->Length);
  }
  tsDuration = DateTime::Now - dtStart;
  Console::WriteLine (tsDuration + "  " + ((double)aui8ImageData->Length * iExec / tsDuration.TotalSeconds / 1e6));

  //========== Buffer.BlockCopy, many small 3-byte blocks ==========
  // 28.0 sec for 10k * 10k * 3 * 10 exec, ~ 107 MB/s
  iExec = 10;
  dtStart = DateTime::Now;
  for (int ixExec = 0; ixExec < iExec; ixExec++)
  {
    int ixCnt = 0;
    for (int ixRow = 0; ixRow < iHeight; ixRow++)
    {
      for (int ixCol = 0; ixCol < iWidth; ixCol++)
      {
        Buffer::BlockCopy (aui8ImageData, ixCnt, aui8Buffer, ixCnt, iBytesPerPixel);
        ixCnt += iBytesPerPixel;
      }
    }
  }
  tsDuration = DateTime::Now - dtStart;
  Console::WriteLine (tsDuration + "  " + ((double)aui8ImageData->Length * iExec / tsDuration.TotalSeconds / 1e6));

  oBitmap->UnlockBits (oBitmapData);

  oBitmap->Save ("d:\\temp\\bitmap.bmp", ImageFormat::Bmp);
}

related information:
Why is memcpy() and memmove() faster than pointer increments?
Array.Copy vs Buffer.BlockCopy, Answer https://stackoverflow.com/a/33865267
https://github.com/dotnet/coreclr/issues/2430 "Array.Copy & Buffer.BlockCopy x2 to x3 slower < 1kB"
https://github.com/dotnet/coreclr/blob/master/src/vm/comutilnative.cpp, Line 718 at the time of writing: Buffer.BlockCopy() uses memmove

Iffy answered 27/10, 2018 at 6:50 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++