What should I pin when working on arrays?

Asked 9/6, 2014 at 14:56 Answered 6/11, 2017 at 2:36

I'm trying to write a DynamicMethod to wrap the cpblk IL opcode. I need to copy chunks of byte arrays and on x64 platforms, this is supposedly the fastest way to do it. Array.Copy and Buffer.BlockCopy both work, but I'd like to explore all options.

My goal is to copy managed memory from one byte array to a new managed byte array. My concern is how do I know how to correctly "pin" memory location. I don't want the garbage collector to move the arrays and break everything. SO far it works but I'm not sure how to test if this is GC safe.

// copying 'count' bytes from offset 'index' in 'source' to offset 0 in 'target'
// i.e. void _copy(byte[] source, int index, int count, byte[] target)

static Action<byte[], int, int, byte[]> Init()
{
    var dmethod = new DynamicMethod("copy", typeof(void), new[] { typeof(object),typeof(byte[]), typeof(int), typeof(int),typeof(byte[]) },typeof(object), true);
    var il = dmethod.GetILGenerator();

    il.DeclareLocal(typeof(byte).MakeByRefType(), true);
    il.DeclareLocal(typeof(byte).MakeByRefType(), true);
    // pin the source
    il.Emit(OpCodes.Ldarg_1);
    il.Emit(OpCodes.Ldarg_2);
    il.Emit(OpCodes.Ldelema, typeof(byte));
    il.Emit(OpCodes.Stloc_0);
    // pin the target
    il.Emit(OpCodes.Ldarg_S,(byte)4);
    il.Emit(OpCodes.Ldc_I4_0);
    il.Emit(OpCodes.Ldelema, typeof(byte));
    il.Emit(OpCodes.Stloc_1);

    il.Emit(OpCodes.Ldloc_1);
    il.Emit(OpCodes.Ldloc_0);
    // load the length
    il.Emit(OpCodes.Ldarg_3);
    // perform the memcpy
    il.Emit(OpCodes.Unaligned,(byte)1);
    il.Emit(OpCodes.Cpblk);

    il.Emit(OpCodes.Ret);
    return dmethod.CreateDelegate(typeof(Action<byte[], int, int, byte[]>)) as Action<byte[], int, int, byte[]>;
}

Neuroglia answered 9/6, 2014 at 14:56 Comment(6)

Use the existing methods, they are most likely very similar to what you're trying to do and have the advantage of being able to "cheat", i.e. use system functions not exposed to code running inside the runtime. By trying to solve this your own way you're currently wasting time/money without guaranteed benefit (unless it's a research project, in which case by all means go for it). – Zigzagger 9/6, 2014 at 15:6

They are similar, but the il opcode performs quicker for large byte array copies. (For less than 10 elements it performs very poorly, Array.Copy seems to be very good in this space). Originally, I was referencing a C++/CLI dll dependency that I'm trying to remove that also required using the "unsafe" compilation option. I'm trying to encapsulate all of that in one dynamic method to avoid this annoyance. Another advantage is that the IL bytecode does not require that I use primitives. In this case, I am using bytes, but I'd like to also be able to copy other structs around quickly. – Neuroglia 9/6, 2014 at 15:24

Buffer.BlockCopy is actually slower for byte copying then array.copy. It does work between struct types but only for primitives. Try copying DateTimes with it for instance and it blows up. – Neuroglia 9/6, 2014 at 17:43

OpCodes.Cpblk is extremely slow on x86 though, depending on circumstances. There doesn't seem to be a generally best algorithm but the others appear to be more stable. You could branch based on architecture though, if the performance gain is actually significant in your use case. – Zigzagger 9/6, 2014 at 20:25

That is not a concern for me. This is for server code on a x64 bit environment. – Neuroglia 11/6, 2014 at 19:46

I believe you don't have to perform any pinning. The GC shouldn't move the array during cpblk, and it will update the managed references if the array is moved before that. – Wellinformed 23/7, 2015 at 8:31

I believe that your usage of pinned local variables is correct.

Zany answered 9/6, 2014 at 19:29 Comment(0)

void cpblk<T>(ref T src, ref T dst, int c_elem)

Copies c_elem elements of type T from src to dst using the cpblk IL instruction. The element type T must describe an unmanaged ValueType (or primitive); cpblk cannot copy memory which contains GC object references at any level of nesting. Note that c_elem indicates the number of elements, not the number of bytes. Tested with C#7 and .NET 4.7. See usage example below.

public static class IL<T>
{
    public delegate void _cpblk_del(ref T src, ref T dst, int c_elem);
    public static readonly _cpblk_del cpblk;

    static IL()
    {
        var dm = new DynamicMethod("cpblk+" + typeof(T).FullName,
            typeof(void),
            new[] { typeof(T).MakeByRefType(), typeof(T).MakeByRefType(), typeof(int) },
            typeof(T),
            true);

        var il = dm.GetILGenerator();
        il.Emit(OpCodes.Ldarg_1);
        il.Emit(OpCodes.Ldarg_0);
        il.Emit(OpCodes.Ldarg_2);

        int cb = Marshal.SizeOf<T>();
        if (cb > 1)
        {
            il.Emit(OpCodes.Ldc_I4, cb);
            il.Emit(OpCodes.Mul);
        }

        byte align;
        if ((cb & (align = 1)) != 0 ||
            (cb & (align = 2)) != 0 ||
            (cb & (align = 4)) != 0)
            il.Emit(OpCodes.Unaligned, align);

        il.Emit(OpCodes.Cpblk);
        il.Emit(OpCodes.Ret);
        cpblk = (_cpblk_del)dm.CreateDelegate(typeof(_cpblk_del));
    }
}

Note that this code assumes that the elements are byte-packed (i.e., no padding between individual elements) and aligned according to their size. Specifically, the source and destination addresses should be divisible by 1 << floor(log₂(sizeof(T) & 0xF)) Said another way, if sizeof(T) % 8 is non-zero, then OpCodes.Unaligned prefix is emitted specifying the highest divisor of that remainder amongst {1, 2, or 4}. For 8-byte alignment, no prefix is needed.

As an example, a 11-byte struct requires alignment prefix 1 because even if the first element in the range happens to be quad-aligned, byte-packing means the adjacent ones won't be. Normally, the CLR arranges arrays this way and you don't have to worry about these issues.

Usage:

var src = new[] { 1, 2, 3, 4, 5, 6 };
var dst = new int[6];

IL<int>.cpblk(ref src[2], ref dst[3], 2);      // dst => { 0, 0, 0, 3, 4, 0 }

Automatic type inference (optional):

For automatic type inference, you can include the following class as well:

public static class IL
{
    public static void cpblk<T>(ref T src, ref T dst, int c_elem) 
        => IL<T>.cpblk(ref src, ref dst, c_elem);
}

With this, you don't need to specify the type arguments and the previous example becomes simply:

IL.cpblk(ref src[2], ref dst[3], 2);

Marshamarshal answered 6/11, 2017 at 2:36 Comment(1)

This is an excellent answer, and it's fast. But isn't this also the way System.Buffers.BlockCopy works internally? – Bloodless 9/6, 2020 at 16:44

You don't need to pin anything in this method, if you want to pin then pin your array before input to this method. You don't need to pin any pointer because address of an element alway same unless you restart your program, you can even stock it into intptr type without any problem.

.maxstack 3
ldarg.0
ldarg.1
ldelema int8

ldarg.2
ldarg.3
ldelema int8

ldarg.s 4
cpblk

ret

Bascomb answered 15/10, 2014 at 10:19 Comment(1)

The address of anything on the heap can and will change all the time due to GC collection and compaction, it's fundamental to .NET. The info in this answer is just not true. Since the OP asked for cpblk, which is a managed instruction, pinning is not needed and the CLR will take care of the changing pointer addresses. – Bloodless 13/6, 2020 at 20:21

Recommended topics

Hot tags