Quicker way to deepcopy objects in golang, JSON vs gob
Asked Answered
U

1

19

I am using go 1.9. And I want to deepcopy value of object into another object. I try to do it with encoding/gob and encoding/json. But it takes more time for gob encoding than json encoding. I see some other questions like this and they suggest that gob encoding should be quicker. But I see exact opposite behaviour. Can someone tell me if I am doing something wrong? Or any better and quicker way to deepcopy than these two? My object's struct is complex and nested.

The test code:

package main

import (
    "bytes"
    "encoding/gob"
    "encoding/json"
    "log"
    "time"

    "strconv"
)

// Test ...
type Test struct {
    Prop1 int
    Prop2 string
}

// Clone deep-copies a to b
func Clone(a, b interface{}) {

    buff := new(bytes.Buffer)
    enc := gob.NewEncoder(buff)
    dec := gob.NewDecoder(buff)
    enc.Encode(a)
    dec.Decode(b)
}

// DeepCopy deepcopies a to b using json marshaling
func DeepCopy(a, b interface{}) {
    byt, _ := json.Marshal(a)
    json.Unmarshal(byt, b)
}

func main() {
    i := 0
    tClone := time.Duration(0)
    tCopy := time.Duration(0)
    end := 3000
    for {
        if i == end {
            break
        }

        r := Test{Prop1: i, Prop2: strconv.Itoa(i)}
        var rNew Test
        t0 := time.Now()
        Clone(r, &rNew)
        t2 := time.Now().Sub(t0)
        tClone += t2

        r2 := Test{Prop1: i, Prop2: strconv.Itoa(i)}
        var rNew2 Test
        t0 = time.Now()
        DeepCopy(&r2, &rNew2)
        t2 = time.Now().Sub(t0)
        tCopy += t2

        i++
    }
    log.Printf("Total items %+v, Clone avg. %+v, DeepCopy avg. %+v, Total Difference %+v\n", i, tClone/3000, tCopy/3000, (tClone - tCopy))
}

I get following output:

Total items 3000, Clone avg. 30.883µs, DeepCopy avg. 6.747µs, Total Difference 72.409084ms
Uncleanly answered 17/10, 2017 at 12:31 Comment(6)
You are recreating your gob en/decoder each time. Also: Both ways of deep-copying are awful. And: Use standard testing benchmarks. Last: Your struct doesn't need and fancy deep copy mechanism.Fahey
If you're going to benchmark code, start by using a proper benchmark. That will give you more accurate times, and will also provide memory allocation stats, which will highlight your problems here.Danner
@Fahey - Which way do you suggest then? The struct in question is just a test code but my actual struct is complex and nested.Uncleanly
@Rohanil: If all your fields are public, you can live with any of the type restrictions they may impose, and you don't mind paying the price for reflection, then gob or json will work fine. You can always do better with a custom DeepCopy method which doesn't use reflection at all.Danner
"Complex and nested" struct does not mean a simple a := b won't produce a deep copy. If your actual code contains pointers, slices, maps, functions or channels things become more complicated and you should test this. If you struct contains e.g. closures over functions you will have a hard time making a deep copy. If your struct contains loops you are out of luck with JSON. If your struct contains just some slices and maps then the simplest might be an initial flat copy and manually copying the few slices/maps. If it's []map[string][][]map[int][]string: redesign.Fahey
My struct contains pointers, slices, maps. It also contains fields with custom structs. Those sub structs also have pointers, maps, slices.Uncleanly
M
24

JSON vs gob difference

The encoding/gob package needs to transmit type definitions:

The implementation compiles a custom codec for each data type in the stream and is most efficient when a single Encoder is used to transmit a stream of values, amortizing the cost of compilation.

When you "first" serialize a value of a type, the definition of the type also has to be included / transmitted, so the decoder can properly interpret and decode the stream:

A stream of gobs is self-describing. Each data item in the stream is preceded by a specification of its type, expressed in terms of a small set of predefined types.

This is explained in great details here: Efficient Go serialization of struct to disk

So while in your case it's necessary to create a new gob encoder and decoder each time, it is still the "bottleneck", the part that makes it slow. Encoding to / decoding from JSON format, type description is not included in the representation.

To prove it, make this simple change:

type Test struct {
    Prop1 [1000]int
    Prop2 [1000]string
}

What we did here is made the types of fields arrays, "multiplying" the values a thousand times, while the type information is effectively remained the same (all elements in the arrays have the same type). Creating values of them like this:

r := Test{Prop1: [1000]int{}, Prop2: [1000]string{}}

Now running your test program, the output on my machine:

Original:

2017/10/17 14:55:53 Total items 3000, Clone avg. 33.63µs, DeepCopy avg. 2.326µs, Total Difference 93.910918ms

Modified version:

2017/10/17 14:56:38 Total items 3000, Clone avg. 119.899µs, DeepCopy avg. 462.608µs, Total Difference -1.02812648s

As you can see, in the original version JSON is faster, but in the modified version gob became faster, as the cost of transmitting type info amortized.

Testing / benching method

Now on to your testing method. This way of measuring performance is bad and can yield quite inaccurate results. Instead you should use Go's built-in testing and benchmark tools. For details, read Order of the code and performance.

Caveats of these cloning

These methods work with reflection and thus can only "clone" fields that are accessible via reflection, that is: exported. Also they often don't manage pointer equality. By this I mean if you have 2 pointer fields in a struct, both pointing to the same object (pointers being equal), after marshaling and unmarshaling, you'll get 2 different pointers pointing to 2 different values. This may even cause problems in certain situations. They also don't handle self-referencing structures, which at best returns an error, or in wrose case causes an infinite loop or goroutine stack exceeding.

The "proper" way of cloning

Considering the caveats mentioned above, often the proper way of cloning needs help from the "inside". That is, cloning a specific type is often only possible if that type (or the package of that type) provides this functionality.

Yes, providing a "manual" cloning functionality is not convenient, but on the other side it will outperform the above methods (maybe even by orders of magnitude), and will require the least amount of "working" memory required for the cloning process.

Meakem answered 17/10, 2017 at 14:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.