How to use fixed with a variable of type Array or T[]?

Asked 12/2, 2013 at 13:2 Answered 12/2, 2013 at 14:47

I'm working on an IEqualityComparer which is supposed to compare arrays of primitive types really fast. My plan is to obtain pointers to the arrays and memcmp them. Like this:

    public unsafe override bool Equals(T[] x, T[] y)
    {
        if (ReferenceEquals(x, y)) return true;
        if (x == null || y == null) return false;
        if (x.Length != y.Length) return false;

        var xArray = (Array)x;
        var yArray = (Array)y;
        fixed (void* xPtr = xArray) //compiler error 1
        fixed (T* yPtr = y) //compiler error 2
        {
            return memcmp(xPtr, yPtr, x.Length * this.elementSize);
        }
    }

The fixed statement does not allow me to pin either Array or T[].

There error messages are:

1. Cannot implicitly convert type 'System.Array' to 'void*'
2. Cannot take the address of, get the size of, or declare a pointer to a managed type ('T')

Now, I don't actually care how I make this work (I'm not committed to this exact approach). How can I memcmp two T[] where I know that T is a primitive/blittable type?

I really want to avoid switching on the type and creating a specialized (and duplicated) code version for each interesting type. Any kind of reflection solution is not workable due to performance constraints (and yes I really need the performance here - no premature optimization warnings apply as it is customary on Stack Overflow).

Tremulant answered 12/2, 2013 at 13:2 Comment(7)

you can't call memcpy from c# – Warranty 12/2, 2013 at 13:3

In addition, have you tried public unsafe override bool Equals(T[] x, T[] y) where T: struct? – Cypro 12/2, 2013 at 13:4

why are you trying to optimize your code? you are probably creating more performance problems by marshaling to unmanaged code. – Warranty 12/2, 2013 at 13:4

@DanielA.White why not? I'll be either PInvoking or using a managed port depending on the size of the array at runtime. This issue is not essential to the question anyway. I like the elegance of comparing byte* chunks instead of specializing a compare loop for 9 different primitive types. Also note that for huge arrays any fixed per-call overhead does not matter. – Tremulant 12/2, 2013 at 13:5

p/invokes have overheads which would negate the benefit. you should let the clr do its work. – Warranty 12/2, 2013 at 13:6

@Davio: That doesn't change anything. – Olfactory 12/2, 2013 at 13:6

I'm betting that using memcmp() will speed up things significantly for large arrays. The marshalling of the pinned pointer should NOT copy memory. For example, see #2031717 – Swingletree 12/2, 2013 at 13:30

where I know that T is a primitive/blittable type

You know it, the compiler doesn't know that. The CLR demands that everything in a pinned object can no longer be moved by the garbage collector. For an Array, that includes its array elements. Only kind of T that qualifies is a simple value type, one that is blittable. Generics does not give you a way to constrain T to a blittable type.

You'd normally declare the arguments of memcmp() as byte[]. The pinvoke marshaller then already does the right thing and will pin the byte[] arrays before calling memcmp(). This however will not work either since you cannot easily convert a T[] to a byte[] either. You'll have to pin yourself with GCHandle. Declare the memcmp() arguments as IntPtr instead of byte[] accordingly.

The subset of types which can work is in practice small enough to consider simply writing method overloads instead of a generic method. Which now enables the pinvoke marshaller to take care of the pinning, overload the memcmp() function declarations accordingly.

Dressing answered 12/2, 2013 at 13:48 Comment(6)

Do you have any reference to the fixed statement not pinning objects in memory, which is what it's supposed to do? – Serrated 12/2, 2013 at 13:59

Just look at the IL generated by the statement. – Dressing 12/2, 2013 at 14:0

Very valuable, Hans. I kind of knew that the C# compiler just uses managed pointers ("interior pointers") but I never realized that this means that the object is not pinned.; I'm convinced now that the last statement that you edited in is the right design choice. I'll follow up on this. – Tremulant 12/2, 2013 at 14:3

I'll accept this answer as it clearly states that multiple overloaded PInvoke functions are required. There also seems to be no solution to my idea of obtaining a fixed pointer without specialization on type. – Tremulant 13/2, 2013 at 8:59

"The fixed statement does not in fact pin an object. It creates a new pointer that's recognized by the garbage collector as a reference. So that the GC will update that pointer when it moves the array while compacting the heap." Not true. fixed statement instructs GC not to move memory referenced by fixed slots. So the memory is pinned in reality. – Etoile 29/5, 2013 at 14:22

With C# 7.3 there is a generic type constraint called unmanaged but sadly this does not seem to support fixed arrays. – Potty 17/6, 2019 at 12:36

You can write a separate P/Invoke front-end for each kind of array that you want to compare. I know you don't really want to specialise on T, but the overhead isn't too great I think.

This is a bit of a hack, because I'm defining multiple P/Invoke methods with different signatures for the same API function, but by doing this I can leverage the P/Invoke marshalling support.

(Note that the sign of the return value from memcmp is really only meaningful if the source data is indeed an array of bytes. If you are giving it an array of ints, you should only compare the return value with zero and ignore its sign. The ordering that it implies is not meaningful for ints.)

For example, the following code prints the following for me (RELEASE build, not debug build):

MemCmp with ints took 00:00:08.0768666
ManagedMemCmp with ints took 00:00:10.3750453
MemCmp with bytes took 00:00:01.8740001
ManagedMemCmp with bytes took 00:00:09.2885763

Note that the byte[] test is using bytes so is comparing a quarter of the number of bytes than the int[] test uses. The managed code makes the same number of comparisons, so it is comparatively a lot slower.

There isn't really a huge difference between the times for comparing arrays of ints, but there is a large difference for comparing arrays of bytes. This suggests to me that there could be an managed optimisation that uses fixed to get a pointer to ints from a byte array in order to compare 4 bytes at a time (with some fiddling for the possibly extra bytes at the end that don't fit into an int).

I also think you could write a multithreaded managed version (using the "int*" optimisation to compare byte arrays) which would be MUCH FASTER than the unmanaged memcmp(), which is of course not multithreaded (as far as I know).

Anyway, here's my test code. Remember, RELEASE build, not debug!

using System;
using System.Diagnostics;
using System.Runtime.InteropServices;


namespace Demo
{
    public static class Program
    {
        private static void Main(string[] args)
        {
            int[] a1 = new int[1000000];
            int[] a2 = new int[1000000];

            for (int i = 0; i < a1.Length-1; ++i)
            {
                a1[i] = i;
                a2[i] = i;
            }

            a1[a1.Length-1] = 1;
            a2[a1.Length-1] = 2;

            byte[] b1 = new byte[1000000];
            byte[] b2 = new byte[1000000];

            for (int i = 0; i < b1.Length-1; ++i)
            {
                b1[i] = (byte)i;
                b2[i] = (byte)i;
            }

            b1[a1.Length-1] = 1;
            b2[a1.Length-1] = 2;

            Stopwatch sw = Stopwatch.StartNew();
            testWithMemCmp(a1, a2);
            sw.Stop();
            Console.WriteLine("MemCmp with ints took " + sw.Elapsed);

            sw.Restart();
            testWithManagedMemCmp(a1, a2);
            sw.Stop();
            Console.WriteLine("ManagedMemCmp with ints took " + sw.Elapsed);

            sw.Restart();
            testWithMemCmp(b1, b2);
            sw.Stop();
            Console.WriteLine("MemCmp with bytes took " + sw.Elapsed);

            sw.Restart();
            testWithManagedMemCmp(b1, b2);
            sw.Stop();
            Console.WriteLine("ManagedMemCmp with bytes took " + sw.Elapsed);
        }

        private static void testWithMemCmp(int[] a1, int[] a2)
        {
            for (int j = 0; j < COUNT; ++j)
            {
                MemCmp(a1, a2);
            }
        }

        private static void testWithMemCmp(byte[] a1, byte[] a2)
        {
            for (int j = 0; j < COUNT; ++j)
            {
                MemCmp(a1, a2);
            }
        }

        private static void testWithManagedMemCmp(int[] a1, int[] a2)
        {
            for (int j = 0; j < COUNT; ++j)
            {
                ManagedMemCmp(a1, a2);
            }
        }

        private static void testWithManagedMemCmp(byte[] a1, byte[] a2)
        {
            for (int j = 0; j < COUNT; ++j)
            {
                ManagedMemCmp(a1, a2);
            }
        }

        public static bool ManagedMemCmp(int[] a1, int[] a2)
        {
            if (a1 == null || a2 == null || a1.Length != a2.Length)
            {
                throw new InvalidOperationException("Arrays are null or different lengths.");
            }

            for (int i = 0; i < a1.Length; ++i)
            {
                if (a1[i] != a2[i])
                {
                    return false;
                }
            }

            return true;
        }

        public static bool ManagedMemCmp(byte[] a1, byte[] a2)
        {
            if (a1 == null || a2 == null || a1.Length != a2.Length)
            {
                throw new InvalidOperationException("Arrays are null or different lengths.");
            }

            for (int i = 0; i < a1.Length; ++i)
            {
                if (a1[i] != a2[i])
                {
                    return false;
                }
            }


            return true;
        }

        public static bool MemCmp(byte[] a1, byte[] a2)
        {
            if (a1 == null || a2 == null || a1.Length != a2.Length)
            {
                throw new InvalidOperationException("Arrays are null or different lengths.");
            }

            return memcmp(a1, a2, new UIntPtr((uint)a1.Length)) == 0;
        }

        public static bool MemCmp(int[] a1, int[] a2)
        {
            if (a1 == null || a2 == null || a1.Length != a2.Length)
            {
                throw new InvalidOperationException("Arrays are null or different lengths.");
            }

            return memcmp(a1, a2, new UIntPtr((uint)(a1.Length * sizeof(int)))) == 0;
        }

        [DllImport("msvcrt.dll")]
        private static extern int memcmp(byte[] a1, byte[] a2, UIntPtr count);

        [DllImport("msvcrt.dll")]
        private static extern int memcmp(int[] a1, int[] a2, UIntPtr count);

        private const int COUNT = 10000;
    }
}

Swingletree answered 12/2, 2013 at 14:47 Comment(2)

I tested the PInvoke version and in the managed comparison loop I use ulong* to compare 64 bits at a time. The numbers are much better than the safe, naive element-by-element loop (which is not surprising just like you said). – Tremulant 12/2, 2013 at 15:22

Thank you for this answer. It helped implementing the solution that I now have: For length < 8 use a safe single-element loop. For length < 512 use the unsafe 64 bit loop. In other cases use memcmp. – Tremulant 13/2, 2013 at 9:0

I agree with Daniel A. White's comments telling you that it probably doesn't result in a performance gain but in a performance hit because of the overhead of marshaling to unmanaged code etc.

Having said that, you should be able to use GCHandle.Alloc:

public unsafe bool Equals(T[] x, T[] y)
{
    if (ReferenceEquals(x, y)) return true;
    if (x == null || y == null) return false;
    if (x.Length != y.Length) return false;

    GCHandle handleOfX = default(GCHandle);
    GCHandle handleOfY = default(GCHandle);
    handleOfX = GCHandle.Alloc(x, GCHandleType.Pinned);
    handleOfY = GCHandle.Alloc(y, GCHandleType.Pinned);
    try
    {
        return memcmp(handleOfX.AddrOfPinnedObject(),
                      handleOfY.AddrOfPinnedObject(),
                      x.Length * this.elementSize);
    }
    finally
    {
        if(handleOfX != default(GCHandle)) handleOfX.Free();
        if(handleOfY != default(GCHandle)) handleOfY.Free();
    }
}

Olfactory answered 12/2, 2013 at 13:19 Comment(3)

I'm going to benchmark this approach and post what I found. I'm concerned that GCHandle's are heavy-weight. fixed does not use them. (Btw, I'll only PInvoke for big arrays. I'm going to compare small arrays in unsafe managed code so there's no call overhead at all). – Tremulant 12/2, 2013 at 13:21

Also check this out: techmikael.blogspot.co.uk/2009/01/… And I reckon it will speed things up for LARGE arrays (although I think the version using GC will add a lot of overhead compared to just fixing the arrays - but of course fixing will result in unsafe code. – Swingletree 12/2, 2013 at 13:36

GCHandle performance is really awful. I didn't expect it to be that bad. Still, it is the only solution that allows not to specialize on T. – Tremulant 12/2, 2013 at 14:15

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags