Fastest Way to do Shallow Copy in C#
Asked Answered
F

8

68

I wonder what is the fastest way to do shallow copying in C#? I only know there are 2 ways to do shallow copy:

  1. MemberwiseClone
  2. Copy each field one by one (manual)

I found that (2) is faster than (1). I'm wondering if there's another way to do shallow copying?

Feces answered 8/6, 2009 at 19:13 Comment(4)
It's a shallow copy if copying is not performed recursive.Hagbut
I want to do shallow copy actuallyFeces
A shallow copy is when you duplicate references to data (so both copies reference a shared version of the data). A deep copy is when you duplicate all the actual data (so there are two independent copies of everything).Hierology
If you need a lot of shallow copies, perhaps using struct instead of class would be more appropriate?Udder
F
83

This is a complex subject with lots of possible solutions and many pros and cons to each. There is a wonderful article here that outlines several different ways of making a copy in C#. To summarize:

  1. Clone Manually
    Tedious, but high level of control.

  2. Clone with MemberwiseClone
    Only creates a shallow copy, i.e. for reference-type fields the original object and its clone refer to the same object.

  3. Clone with Reflection
    Shallow copy by default, can be re-written to do deep copy. Advantage: automated. Disadvantage: reflection is slow.

  4. Clone with Serialization
    Easy, automated. Give up some control and serialization is slowest of all.

  5. Clone with IL, Clone with Extension Methods
    More advanced solutions, not as common.

Festivity answered 8/6, 2009 at 19:28 Comment(4)
I see.. Amongst all options available, I think (1) is the fastest. Thanks!Feces
@Feces nope, MemberwiseClone should be the fastest. Your results were probably caused by debug/release mode or some other factor of that sort.Abracadabra
MemberwiseClone does do some extra checks and such, so for objects with few fields, it might actually be slightly slower. In any case, you should only care about that if you're actually experiencing performance issues. Don't optimize "on a hunch", profile. If you're doing so much cloning to have trouble with MemberwiseClone, you should probably be using structures, rather than classes. Or perhaps even more aggressive changes to architecture are needed :DUdder
Have you taken a look at Fasterflect library? It has some nice option to do deep serialization.Pled
C
39

I'd like to start with a few quotes:

In fact, MemberwiseClone is usually much better than others, especially for complex type.

and

I'm confused. MemberwiseClone() should annihilate the performance of anything else for a shallow copy. [...]

Theoretically the best implementation of a shallow copy is a C++ copy constructor: it knows the size compile-time, and then does a memberwise clone of all fields. The next best thing is using memcpy or something similar, which is basically how MemberwiseClone should work. This means, in theory it should obliterate all other possibilities in terms of performance. Right?

... but apparently it isn't blazing fast and it doesn't obliterate all the other solutions. At the bottom I've actually posted a solution that's over 2x faster. So: Wrong.

Testing the internals of MemberwiseClone

Let's start with a little test using a simple blittable type to check the underlying assumptions here about performance:

[StructLayout(LayoutKind.Sequential)]
public class ShallowCloneTest
{
    public int Foo;
    public long Bar;

    public ShallowCloneTest Clone()
    {
        return (ShallowCloneTest)base.MemberwiseClone();
    }
}

The test is devised in such a way that we can check the performance of MemberwiseClone agaist raw memcpy, which is possible because this is a blittable type.

To test by yourself, compile with unsafe code, disable the JIT suppression, compile release mode and test away. I've also put the timings after every line that's relevant.

Implementation 1:

ShallowCloneTest t1 = new ShallowCloneTest() { Bar = 1, Foo = 2 };
Stopwatch sw = Stopwatch.StartNew();
int total = 0;
for (int i = 0; i < 10000000; ++i)
{
    var cloned = t1.Clone();                                    // 0.40s
    total += cloned.Foo;
}

Console.WriteLine("Took {0:0.00}s", sw.Elapsed.TotalSeconds);

Basically I ran these tests a number of times, checked the assembly output to ensure that the thing wasn't optimized away, etc. The end result is that I know approximately how much seconds this one line of code costs, which is 0.40s on my PC. This is our baseline using MemberwiseClone.

Implementation 2:

sw = Stopwatch.StartNew();

total = 0;
uint bytes = (uint)Marshal.SizeOf(t1.GetType());
GCHandle handle1 = GCHandle.Alloc(t1, GCHandleType.Pinned);
IntPtr ptr1 = handle1.AddrOfPinnedObject();

for (int i = 0; i < 10000000; ++i)
{
    ShallowCloneTest t2 = new ShallowCloneTest();               // 0.03s
    GCHandle handle2 = GCHandle.Alloc(t2, GCHandleType.Pinned); // 0.75s (+ 'Free' call)
    IntPtr ptr2 = handle2.AddrOfPinnedObject();                 // 0.06s
    memcpy(ptr2, ptr1, new UIntPtr(bytes));                     // 0.17s
    handle2.Free();

    total += t2.Foo;
}

handle1.Free();
Console.WriteLine("Took {0:0.00}s", sw.Elapsed.TotalSeconds);

If you look closely at these numbers, you'll notice a few things:

  • Creating an object and copying it will take roughly 0.20s. Under normal circumstances this is the fastest possible code you can have.
  • However, to do that, you need to pin and unpin the object. That will take you 0.81 seconds.

So why is all of this so slow?

My explanation is that it has to do with the GC. Basically the implementations cannot rely on the fact that memory will stay the same before and after a full GC (The address of the memory can be changed during a GC, which can happen at any moment, including during your shallow copy). This means you only have 2 possible options:

  1. Pinning the data and doing a copy. Note that GCHandle.Alloc is just one of the ways to do this, it's well known that things like C++/CLI will give you better performance.
  2. Enumerating the fields. This will ensure that between GC collects you don't need to do anything fancy, and during GC collects you can use the GC ability to modify the addresses on the stack of moved objects.

MemberwiseClone will use method 1, which means you'll get a performance hit because of the pinning procedure.

A (much) faster implementation

In all cases our unmanaged code cannot make assumptions about the size of the types and it has to pin data. Making assumptions about size enables the compiler to do better optimizations, like loop unrolling, register allocation, etc. (just like a C++ copy ctor is faster than memcpy). Not having to pin data means we don't get an extra performance hit. Since .NET JIT's to assembler, in theory this means that we should be able to make a faster implementation using simple IL emitting, and allowing the compiler to optimize it.

So to summarize on why this can be faster than the native implementation?

  1. It doesn't require the object to be pinned; objects that are moving around are handled by the GC -- and really, this is relentlessly optimized.
  2. It can make assumptions about the size of the structure to copy, and therefore allows for better register allocation, loop unrolling, etc.

What we're aiming for is the performance of raw memcpy or better: 0.17s.

To do that, we basically cannot use more than just a call, create the object, and perform a bunch of copy instructions. It looks a bit like the Cloner implementation above, but some important differences (most significant: no Dictionary and no redundant CreateDelegate calls). Here goes:

public static class Cloner<T>
{
    private static Func<T, T> cloner = CreateCloner();

    private static Func<T, T> CreateCloner()
    {
        var cloneMethod = new DynamicMethod("CloneImplementation", typeof(T), new Type[] { typeof(T) }, true);
        var defaultCtor = typeof(T).GetConstructor(new Type[] { });

        var generator = cloneMethod .GetILGenerator();

        var loc1 = generator.DeclareLocal(typeof(T));

        generator.Emit(OpCodes.Newobj, defaultCtor);
        generator.Emit(OpCodes.Stloc, loc1);

        foreach (var field in typeof(T).GetFields(BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic))
        {
            generator.Emit(OpCodes.Ldloc, loc1);
            generator.Emit(OpCodes.Ldarg_0);
            generator.Emit(OpCodes.Ldfld, field);
            generator.Emit(OpCodes.Stfld, field);
        }

        generator.Emit(OpCodes.Ldloc, loc1);
        generator.Emit(OpCodes.Ret);

        return ((Func<T, T>)cloneMethod.CreateDelegate(typeof(Func<T, T>)));
    }

    public static T Clone(T myObject)
    {
        return cloner(myObject);
    }
}

I've tested this code with the result: 0.16s. This means it's approximately 2.5x faster than MemberwiseClone.

More importantly, this speed is on-par with memcpy, which is more or less the 'optimal solution under normal circumstances'.

Personally, I think this is the fastest solution - and the best part is: if the .NET runtime will get faster (proper support for SSE instructions etc), so will this solution.

Editorial Note: The sample code above assumes that the default constructor is public. If it is not, the call to GetConstructor returns null. In that case, use one of the other GetConstructor signatures to obtain protected or private constructors. See https://learn.microsoft.com/en-us/dotnet/api/system.type.getconstructor?view=netframework-4.8

Competition answered 20/7, 2015 at 8:53 Comment(17)
Hmm. Have you tested this calling Cloner<BaseClass>.Clone(someDerivedObject);? My hunch is that it would fail, because you're using static compile-time field information to create the cloner. Clone implementations are often defined in a base class (a-la MemberwiseClone) which cannot possibly know about future derived classes, but must still copy all fields down the hierarchy.Breathe
@BrucePierson Well yes, that's not how it's designed, right? Still, if that is your scenario, it's pretty easy to use this.GetType() of course. Personally I try to avoid cluttering base classes with utility functions like this.Competition
But because it's a static generic class, this.GetType() does no good - you can't use type inference here, can you?Breathe
@BrucePierson You misunderstand me. You can change all occurrences of typeof(T) into someObject.GetType() in the code and store the data in a Dictionary (you need a lock). That will just eat away your performance. Alternatively you can create the clone abstract, make overrides of clone and call my code, which is much faster, or call it directly and not put it in a base class (what I would do) -- this thread was after all about performance.Competition
Yes, I see. Indeed, making the type non-generic and applying the generic type parameter to the Clone method would work, but I highly doubt that there would be any performance gain at that point even with an O:1 dictionary. Anyway, +1 for a very interesting write-up, though maybe you should add some of those caveats as notes so someone doesn't take off with this as a full MemberwiseClone replacement...Breathe
@BrucePierson I've actually tested the dictionary approach you describe - there is a huge difference in performance. Apart from that it doesn't make much sense to me to put the clone in the base class... To quote Mike Acton (ref. to his good talk on cppcon): software has to make sense to machines, not to humans. IMO Usually you know what to clone, because a member function asks for a clone -- wrt that, it makes little sense to forget that piece of information and then figure that out again by doing a dictionary lookup... Just my 2 cents...Competition
This is a great answer. But one thing is still not clear to me: if this method is so much faster than MemberwiseClone, why wouldn't the CLR be doing this instead?Gelid
@ChrisGillum I find it a funny assumption that .NET is aimed at performance. If you would ask me, .NET is first and foremost implemented with 'maintainability' and 'correctness' in mind. Just as an example, I personally couldn't care less about having a ArgumentNullException or a NullReferenceException (yea I know the difference) -- that alone already accounts for thousands of checks in the .NET runtime with effects on branch prediction, etc. Personally, I find it pretty easy to beat .NET with respect to performance. However, in terms of software engineering, I find it very tough to beat.Competition
@Competition OK, so if I understand what you're saying, the performance difference between these two options is primarily around the cost of "correctness checking, etc". I personally have no problem with the balance of performance and developer productivity - the .NET team is certainly very conscious about performance, but must balance it with overall developer productivity (and I think they generally make the right choices). Thanks.Gelid
@ChrisGillum Not really. They just focus on developer productivity and provide reasonable performance. If you're used to programming low-level C++ like me, outperforming a .NET application by a factor 10 is not really a challenge...Competition
@Competition - Thanks for what seems like a very thorough analysis and defense of your suggested solution. I don't know if it's the fastest, but I feel this solution for a shallow copy strikes an excellent balance between being generic and decoupled from the base class and being performant. When possible, I too feel it's preferable to avoid introducing clutter in classes to address generic concerns. So if a generic shallow copy operation is what I'm after, I think I'll be using this from now on.Timisoara
The call to GetConstructor used in the example only finds public constructors. Use another signature to find constructors declared with a different access modifier.Lynden
@PaulChernoch That was intentional. If people hide constructors, they usually do that intentionally; take for example System.IO.File. That said, an even better design is to add a specialized constructor with a type that's well known (say - MyConstructor(ShallowCloneable arg)) -- that way, the developer always knows what's going to happen, even though the argument isn't used for anything but disambiguation (to ensure the constructor doesn't initialize anything we don't want it to initialize). You get the point...Competition
@Competition - Sadly, we are sometimes saddled with 3rd party libraries that provide neither a clone operation nor a public, default constructor and need to clone the objects. Your code example, modified, permits that use case.Lynden
@PaulChernoch Yes in some scenario's you can do that... Just be careful with it; you might end up cloning a singleton or unmanaged resource...Competition
I benchmarked this, memberwiseclone and a manual implemented clone and they all are around the same ballpart. None of them are faster. Same memory allocated, same execution time. This might have been useful a long time ago, but it no longer makes any meaningful difference.Rinaldo
@Max, Thanks for benchmarking it again. It's been 5 years now since I made this post and it's good that it's finally all in the same ballpark.Competition
A
31

I'm confused. MemberwiseClone() should annihilate the performance of anything else for a shallow copy. In the CLI, any type other than an RCW should be able to be shallow-copied by the following sequence:

  • Allocate memory in the nursery for the type.
  • memcpy the data from the original to the new. Since the target is in the nursery, no write barriers are required.
  • If the object has a user-defined finalizer, add it to the GC list of items pending finalization.
    • If the source object has SuppressFinalize called on it and such a flag is stored in the object header, unset it in the clone.

Can someone on the CLR internals team explain why this is not the case?

Ancon answered 20/8, 2009 at 21:34 Comment(2)
@ta.speot.is, bah humbug. It's a good point to raise, and most of us are in these discussions to understand things better in general.Slush
@SamHaswell MemberwiseClone is an external call which means it requires pinning. Pinning is a relatively slow process. I've posted the details of my tests and results in my answer below. I don't have the native call benchmark data here anymore, but I'm pretty sure pinning/native interop and the extra vtable calls (clone, sizeof) account for the overhead irt. memcpy that you've found.Competition
E
18

Why complicate things? MemberwiseClone would suffice.

public class ClassA : ICloneable
{
   public object Clone()
   {
      return this.MemberwiseClone();
   }
}

// let's say you want to copy the value (not reference) of the member of that class.
public class Main()
{
    ClassA myClassB = new ClassA();
    ClassA myClassC = new ClassA();
    myClassB = (ClassA) myClassC.Clone();
}
Eskil answered 8/3, 2013 at 12:26 Comment(0)
U
8

This is a way to do it using dynamic IL generation. I found it somewhere online:

public static class Cloner
{
    static Dictionary<Type, Delegate> _cachedIL = new Dictionary<Type, Delegate>();

    public static T Clone<T>(T myObject)
    {
        Delegate myExec = null;

        if (!_cachedIL.TryGetValue(typeof(T), out myExec))
        {
            var dymMethod = new DynamicMethod("DoClone", typeof(T), new Type[] { typeof(T) }, true);
            var cInfo = myObject.GetType().GetConstructor(new Type[] { });

            var generator = dymMethod.GetILGenerator();

            var lbf = generator.DeclareLocal(typeof(T));

            generator.Emit(OpCodes.Newobj, cInfo);
            generator.Emit(OpCodes.Stloc_0);

            foreach (var field in myObject.GetType().GetFields(BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic))
            {
                // Load the new object on the eval stack... (currently 1 item on eval stack)
                generator.Emit(OpCodes.Ldloc_0);
                // Load initial object (parameter)          (currently 2 items on eval stack)
                generator.Emit(OpCodes.Ldarg_0);
                // Replace value by field value             (still currently 2 items on eval stack)
                generator.Emit(OpCodes.Ldfld, field);
                // Store the value of the top on the eval stack into the object underneath that value on the value stack.
                //  (0 items on eval stack)
                generator.Emit(OpCodes.Stfld, field);
            }

            // Load new constructed obj on eval stack -> 1 item on stack
            generator.Emit(OpCodes.Ldloc_0);
            // Return constructed object.   --> 0 items on stack
            generator.Emit(OpCodes.Ret);

            myExec = dymMethod.CreateDelegate(typeof(Func<T, T>));

            _cachedIL.Add(typeof(T), myExec);
        }

        return ((Func<T, T>)myExec)(myObject);
    }
}
Uhland answered 8/6, 2009 at 19:16 Comment(5)
Since this uses reflection, it's not likely to be quicker than object.MemberwiseClone (which is performed internally by the CLR). I recommend the OP simply use MemberwiseClone in most situations, or copy the fields manually on a per-class basis if high performance is required.Roselleroselyn
FYI this will not work with inheritance chains, since you need to grab the underlying types as wellMidway
@Roselleroselyn incorrect, it uses reflection only to create the DynamicMethod - subsequent calls are as fast as manual cloning. However, that is still slower than MemberwiseClone, assuming of course that .NET implements it properly using "memcpy" kind of operation.Abracadabra
@Mr.TA: Don't say "incorrect" when you haven't contradicted what I've said really. Okay, the reflection use is minimal and the slowdown apart from the first-call is due mainly to the managed calls used, but still.Roselleroselyn
Just a note, you could avoid the static dictionary by specifying the whole class to be generic. You'd only have a single static field in the generic class for the delegate. In fact, you could also have a singleton implementing an ICloneable interface you could dynamically build, which would be even faster than the delegate :))Udder
N
7

In fact, MemberwiseClone is usually much better than others, especially for complex type.

The reason is that:if you manual create a copy, it must call one of the type's constructor, but use memberwise clone, I guess it just copy a block of memory. for those types has very expensive construct actions, memberwise clone is absolutely the best way.

Onece i wrote such type: {string A = Guid.NewGuid().ToString()}, I found memberwise clone is muct faster than create a new instance and manual assign members.

The code below's result:

Manual Copy:00:00:00.0017099

MemberwiseClone:00:00:00.0009911

namespace MoeCard.TestConsole
{
    class Program
    {
        static void Main(string[] args)
        {
            Program p = new Program() { AAA = Guid.NewGuid().ToString(), BBB = 123 };
            Stopwatch sw = Stopwatch.StartNew();
            for (int i = 0; i < 10000; i++)
            {
                p.Copy1();
            }
            sw.Stop();
            Console.WriteLine("Manual Copy:" + sw.Elapsed);

            sw.Restart();
            for (int i = 0; i < 10000; i++)
            {
                p.Copy2();
            }
            sw.Stop();
            Console.WriteLine("MemberwiseClone:" + sw.Elapsed);
            Console.ReadLine();
        }

        public string AAA;

        public int BBB;

        public Class1 CCC = new Class1();

        public Program Copy1()
        {
            return new Program() { AAA = AAA, BBB = BBB, CCC = CCC };
        }
        public Program Copy2()
        {
            return this.MemberwiseClone() as Program;
        }

        public class Class1
        {
            public DateTime Date = DateTime.Now;
        }
    }

}

finally, I provide my code here:

    #region 数据克隆
    /// <summary>
    /// 依据不同类型所存储的克隆句柄集合
    /// </summary>
    private static readonly Dictionary<Type, Func<object, object>> CloneHandlers = new Dictionary<Type, Func<object, object>>();

    /// <summary>
    /// 根据指定的实例,克隆一份新的实例
    /// </summary>
    /// <param name="source">待克隆的实例</param>
    /// <returns>被克隆的新的实例</returns>
    public static object CloneInstance(object source)
    {
        if (source == null)
        {
            return null;
        }
        Func<object, object> handler = TryGetOrAdd(CloneHandlers, source.GetType(), CreateCloneHandler);
        return handler(source);
    }

    /// <summary>
    /// 根据指定的类型,创建对应的克隆句柄
    /// </summary>
    /// <param name="type">数据类型</param>
    /// <returns>数据克隆句柄</returns>
    private static Func<object, object> CreateCloneHandler(Type type)
    {
        return Delegate.CreateDelegate(typeof(Func<object, object>), new Func<object, object>(CloneAs<object>).Method.GetGenericMethodDefinition().MakeGenericMethod(type)) as Func<object, object>;
    }

    /// <summary>
    /// 克隆一个类
    /// </summary>
    /// <typeparam name="TValue"></typeparam>
    /// <param name="value"></param>
    /// <returns></returns>
    private static object CloneAs<TValue>(object value)
    {
        return Copier<TValue>.Clone((TValue)value);
    }
    /// <summary>
    /// 生成一份指定数据的克隆体
    /// </summary>
    /// <typeparam name="TValue">数据的类型</typeparam>
    /// <param name="value">需要克隆的值</param>
    /// <returns>克隆后的数据</returns>
    public static TValue Clone<TValue>(TValue value)
    {
        if (value == null)
        {
            return value;
        }
        return Copier<TValue>.Clone(value);
    }

    /// <summary>
    /// 辅助类,完成数据克隆
    /// </summary>
    /// <typeparam name="TValue">数据类型</typeparam>
    private static class Copier<TValue>
    {
        /// <summary>
        /// 用于克隆的句柄
        /// </summary>
        internal static readonly Func<TValue, TValue> Clone;

        /// <summary>
        /// 初始化
        /// </summary>
        static Copier()
        {
            MethodFactory<Func<TValue, TValue>> method = MethodFactory.Create<Func<TValue, TValue>>();
            Type type = typeof(TValue);
            if (type == typeof(object))
            {
                method.LoadArg(0).Return();
                return;
            }
            switch (Type.GetTypeCode(type))
            {
                case TypeCode.Object:
                    if (type.IsClass)
                    {
                        method.LoadArg(0).Call(Reflector.GetMethod(typeof(object), "MemberwiseClone")).Cast(typeof(object), typeof(TValue)).Return();
                    }
                    else
                    {
                        method.LoadArg(0).Return();
                    }
                    break;
                default:
                    method.LoadArg(0).Return();
                    break;
            }
            Clone = method.Delegation;
        }

    }
    #endregion
Nobles answered 19/2, 2014 at 10:6 Comment(0)
E
5

Here is a small helper class that uses reflection to access MemberwiseClone and then caches the delegate to avoid using reflection more than necessary.

public static class CloneUtil<T>
{
    private static readonly Func<T, object> clone;

    static CloneUtil()
    {
        var cloneMethod = typeof(T).GetMethod("MemberwiseClone", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic);
        clone = (Func<T, object>)cloneMethod.CreateDelegate(typeof(Func<T, object>));
    }

    public static T ShallowClone(T obj) => (T)clone(obj);
}

public static class CloneUtil
{
    public static T ShallowClone<T>(this T obj) => CloneUtil<T>.ShallowClone(obj);
}

You can call it like this:

Person b = a.ShallowClone();
Encaenia answered 24/10, 2018 at 15:14 Comment(1)
This is perfect, because the object I want to clone isn't my ownTruda
C
4

MemberwiseClone requires less maintenance. I don't know if having default property values helps any, maybe if could ignore items with default values.

Clark answered 8/6, 2009 at 19:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.