How to use fixed with a variable of type Array or T[]?
Asked Answered
T

3

5

I'm working on an IEqualityComparer which is supposed to compare arrays of primitive types really fast. My plan is to obtain pointers to the arrays and memcmp them. Like this:

    public unsafe override bool Equals(T[] x, T[] y)
    {
        if (ReferenceEquals(x, y)) return true;
        if (x == null || y == null) return false;
        if (x.Length != y.Length) return false;

        var xArray = (Array)x;
        var yArray = (Array)y;
        fixed (void* xPtr = xArray) //compiler error 1
        fixed (T* yPtr = y) //compiler error 2
        {
            return memcmp(xPtr, yPtr, x.Length * this.elementSize);
        }
    }

The fixed statement does not allow me to pin either Array or T[].

There error messages are:

1. Cannot implicitly convert type 'System.Array' to 'void*'
2. Cannot take the address of, get the size of, or declare a pointer to a managed type ('T')

Now, I don't actually care how I make this work (I'm not committed to this exact approach). How can I memcmp two T[] where I know that T is a primitive/blittable type?

I really want to avoid switching on the type and creating a specialized (and duplicated) code version for each interesting type. Any kind of reflection solution is not workable due to performance constraints (and yes I really need the performance here - no premature optimization warnings apply as it is customary on Stack Overflow).

Tremulant answered 12/2, 2013 at 13:2 Comment(7)
you can't call memcpy from c#Warranty
In addition, have you tried public unsafe override bool Equals(T[] x, T[] y) where T: struct?Cypro
why are you trying to optimize your code? you are probably creating more performance problems by marshaling to unmanaged code.Warranty
@DanielA.White why not? I'll be either PInvoking or using a managed port depending on the size of the array at runtime. This issue is not essential to the question anyway. I like the elegance of comparing byte* chunks instead of specializing a compare loop for 9 different primitive types. Also note that for huge arrays any fixed per-call overhead does not matter.Tremulant
p/invokes have overheads which would negate the benefit. you should let the clr do its work.Warranty
@Davio: That doesn't change anything.Olfactory
I'm betting that using memcmp() will speed up things significantly for large arrays. The marshalling of the pinned pointer should NOT copy memory. For example, see #2031717Swingletree
D
5

where I know that T is a primitive/blittable type

You know it, the compiler doesn't know that. The CLR demands that everything in a pinned object can no longer be moved by the garbage collector. For an Array, that includes its array elements. Only kind of T that qualifies is a simple value type, one that is blittable. Generics does not give you a way to constrain T to a blittable type.

You'd normally declare the arguments of memcmp() as byte[]. The pinvoke marshaller then already does the right thing and will pin the byte[] arrays before calling memcmp(). This however will not work either since you cannot easily convert a T[] to a byte[] either. You'll have to pin yourself with GCHandle. Declare the memcmp() arguments as IntPtr instead of byte[] accordingly.

The subset of types which can work is in practice small enough to consider simply writing method overloads instead of a generic method. Which now enables the pinvoke marshaller to take care of the pinning, overload the memcmp() function declarations accordingly.

Dressing answered 12/2, 2013 at 13:48 Comment(6)
Do you have any reference to the fixed statement not pinning objects in memory, which is what it's supposed to do?Serrated
Just look at the IL generated by the statement.Dressing
Very valuable, Hans. I kind of knew that the C# compiler just uses managed pointers ("interior pointers") but I never realized that this means that the object is not pinned.; I'm convinced now that the last statement that you edited in is the right design choice. I'll follow up on this.Tremulant
I'll accept this answer as it clearly states that multiple overloaded PInvoke functions are required. There also seems to be no solution to my idea of obtaining a fixed pointer without specialization on type.Tremulant
"The fixed statement does not in fact pin an object. It creates a new pointer that's recognized by the garbage collector as a reference. So that the GC will update that pointer when it moves the array while compacting the heap." Not true. fixed statement instructs GC not to move memory referenced by fixed slots. So the memory is pinned in reality.Etoile
With C# 7.3 there is a generic type constraint called unmanaged but sadly this does not seem to support fixed arrays.Potty
S
2

You can write a separate P/Invoke front-end for each kind of array that you want to compare. I know you don't really want to specialise on T, but the overhead isn't too great I think.

This is a bit of a hack, because I'm defining multiple P/Invoke methods with different signatures for the same API function, but by doing this I can leverage the P/Invoke marshalling support.

(Note that the sign of the return value from memcmp is really only meaningful if the source data is indeed an array of bytes. If you are giving it an array of ints, you should only compare the return value with zero and ignore its sign. The ordering that it implies is not meaningful for ints.)

For example, the following code prints the following for me (RELEASE build, not debug build):

MemCmp with ints took 00:00:08.0768666
ManagedMemCmp with ints took 00:00:10.3750453
MemCmp with bytes took 00:00:01.8740001
ManagedMemCmp with bytes took 00:00:09.2885763

Note that the byte[] test is using bytes so is comparing a quarter of the number of bytes than the int[] test uses. The managed code makes the same number of comparisons, so it is comparatively a lot slower.

There isn't really a huge difference between the times for comparing arrays of ints, but there is a large difference for comparing arrays of bytes. This suggests to me that there could be an managed optimisation that uses fixed to get a pointer to ints from a byte array in order to compare 4 bytes at a time (with some fiddling for the possibly extra bytes at the end that don't fit into an int).

I also think you could write a multithreaded managed version (using the "int*" optimisation to compare byte arrays) which would be MUCH FASTER than the unmanaged memcmp(), which is of course not multithreaded (as far as I know).

Anyway, here's my test code. Remember, RELEASE build, not debug!

using System;
using System.Diagnostics;
using System.Runtime.InteropServices;


namespace Demo
{
    public static class Program
    {
        private static void Main(string[] args)
        {
            int[] a1 = new int[1000000];
            int[] a2 = new int[1000000];

            for (int i = 0; i < a1.Length-1; ++i)
            {
                a1[i] = i;
                a2[i] = i;
            }

            a1[a1.Length-1] = 1;
            a2[a1.Length-1] = 2;

            byte[] b1 = new byte[1000000];
            byte[] b2 = new byte[1000000];

            for (int i = 0; i < b1.Length-1; ++i)
            {
                b1[i] = (byte)i;
                b2[i] = (byte)i;
            }

            b1[a1.Length-1] = 1;
            b2[a1.Length-1] = 2;

            Stopwatch sw = Stopwatch.StartNew();
            testWithMemCmp(a1, a2);
            sw.Stop();
            Console.WriteLine("MemCmp with ints took " + sw.Elapsed);

            sw.Restart();
            testWithManagedMemCmp(a1, a2);
            sw.Stop();
            Console.WriteLine("ManagedMemCmp with ints took " + sw.Elapsed);

            sw.Restart();
            testWithMemCmp(b1, b2);
            sw.Stop();
            Console.WriteLine("MemCmp with bytes took " + sw.Elapsed);

            sw.Restart();
            testWithManagedMemCmp(b1, b2);
            sw.Stop();
            Console.WriteLine("ManagedMemCmp with bytes took " + sw.Elapsed);
        }

        private static void testWithMemCmp(int[] a1, int[] a2)
        {
            for (int j = 0; j < COUNT; ++j)
            {
                MemCmp(a1, a2);
            }
        }

        private static void testWithMemCmp(byte[] a1, byte[] a2)
        {
            for (int j = 0; j < COUNT; ++j)
            {
                MemCmp(a1, a2);
            }
        }

        private static void testWithManagedMemCmp(int[] a1, int[] a2)
        {
            for (int j = 0; j < COUNT; ++j)
            {
                ManagedMemCmp(a1, a2);
            }
        }

        private static void testWithManagedMemCmp(byte[] a1, byte[] a2)
        {
            for (int j = 0; j < COUNT; ++j)
            {
                ManagedMemCmp(a1, a2);
            }
        }

        public static bool ManagedMemCmp(int[] a1, int[] a2)
        {
            if (a1 == null || a2 == null || a1.Length != a2.Length)
            {
                throw new InvalidOperationException("Arrays are null or different lengths.");
            }

            for (int i = 0; i < a1.Length; ++i)
            {
                if (a1[i] != a2[i])
                {
                    return false;
                }
            }

            return true;
        }

        public static bool ManagedMemCmp(byte[] a1, byte[] a2)
        {
            if (a1 == null || a2 == null || a1.Length != a2.Length)
            {
                throw new InvalidOperationException("Arrays are null or different lengths.");
            }

            for (int i = 0; i < a1.Length; ++i)
            {
                if (a1[i] != a2[i])
                {
                    return false;
                }
            }


            return true;
        }

        public static bool MemCmp(byte[] a1, byte[] a2)
        {
            if (a1 == null || a2 == null || a1.Length != a2.Length)
            {
                throw new InvalidOperationException("Arrays are null or different lengths.");
            }

            return memcmp(a1, a2, new UIntPtr((uint)a1.Length)) == 0;
        }

        public static bool MemCmp(int[] a1, int[] a2)
        {
            if (a1 == null || a2 == null || a1.Length != a2.Length)
            {
                throw new InvalidOperationException("Arrays are null or different lengths.");
            }

            return memcmp(a1, a2, new UIntPtr((uint)(a1.Length * sizeof(int)))) == 0;
        }

        [DllImport("msvcrt.dll")]
        private static extern int memcmp(byte[] a1, byte[] a2, UIntPtr count);

        [DllImport("msvcrt.dll")]
        private static extern int memcmp(int[] a1, int[] a2, UIntPtr count);

        private const int COUNT = 10000;
    }
}
Swingletree answered 12/2, 2013 at 14:47 Comment(2)
I tested the PInvoke version and in the managed comparison loop I use ulong* to compare 64 bits at a time. The numbers are much better than the safe, naive element-by-element loop (which is not surprising just like you said).Tremulant
Thank you for this answer. It helped implementing the solution that I now have: For length < 8 use a safe single-element loop. For length < 512 use the unsafe 64 bit loop. In other cases use memcmp.Tremulant
O
1

I agree with Daniel A. White's comments telling you that it probably doesn't result in a performance gain but in a performance hit because of the overhead of marshaling to unmanaged code etc.

Having said that, you should be able to use GCHandle.Alloc:

public unsafe bool Equals(T[] x, T[] y)
{
    if (ReferenceEquals(x, y)) return true;
    if (x == null || y == null) return false;
    if (x.Length != y.Length) return false;

    GCHandle handleOfX = default(GCHandle);
    GCHandle handleOfY = default(GCHandle);
    handleOfX = GCHandle.Alloc(x, GCHandleType.Pinned);
    handleOfY = GCHandle.Alloc(y, GCHandleType.Pinned);
    try
    {
        return memcmp(handleOfX.AddrOfPinnedObject(),
                      handleOfY.AddrOfPinnedObject(),
                      x.Length * this.elementSize);
    }
    finally
    {
        if(handleOfX != default(GCHandle)) handleOfX.Free();
        if(handleOfY != default(GCHandle)) handleOfY.Free();
    }
}
Olfactory answered 12/2, 2013 at 13:19 Comment(3)
I'm going to benchmark this approach and post what I found. I'm concerned that GCHandle's are heavy-weight. fixed does not use them. (Btw, I'll only PInvoke for big arrays. I'm going to compare small arrays in unsafe managed code so there's no call overhead at all).Tremulant
Also check this out: techmikael.blogspot.co.uk/2009/01/… And I reckon it will speed things up for LARGE arrays (although I think the version using GC will add a lot of overhead compared to just fixing the arrays - but of course fixing will result in unsafe code.Swingletree
GCHandle performance is really awful. I didn't expect it to be that bad. Still, it is the only solution that allows not to specialize on T.Tremulant

© 2022 - 2024 — McMap. All rights reserved.