Is encoding/gob deterministic?
Asked Answered
B

3

9

Can we expect for two Go objects x, y such that x is equal to y (assuming no trickiness with interfaces and maps, just structs and arrays) that the output of gob_encode(x) and gob_encode(y) will always be the same?

edit (Jun 8 2018):

gob encoding is non-deterministic when maps are involved. This is due to the random iteration order of the maps, resulting in their serialisation to be randomly ordered.

Bluebeard answered 20/10, 2015 at 5:31 Comment(0)
O
9

You shouldn't really care as long as it "gets the job done". But current encoding/gob implementation is deterministic. But (continue reading)!

Since:

A stream of gobs is self-describing. Each data item in the stream is preceded by a specification of its type, expressed in terms of a small set of predefined types.

This means if you encode a value of a type for the first time, type information will be sent. If you encode another value of the same type, the type description will not be transmitted again, just a reference to its previous spec. So even if you encode the same value twice, it will produce different byte sequences as the first will contain type spec and the value, the second will contain only a type ref (e.g. type id) and the value.

See this example:

type Int struct{ X int }

b := &bytes.Buffer{}
e := gob.NewEncoder(b)

e.Encode(Int{1})
fmt.Println(b.Bytes())

e.Encode(Int{1})
fmt.Println(b.Bytes())

e.Encode(Int{1})
fmt.Println(b.Bytes())

Output (try it on the Go Playground):

[23 255 129 3 1 1 3 73 110 116 1 255 130 0 1 1 1 1 88 1 4 0 0 0 5 255 130 1 2 0]
[23 255 129 3 1 1 3 73 110 116 1 255 130 0 1 1 1 1 88 1 4 0 0 0 5 255 130 1 2 0 5 255 130 1 2 0]
[23 255 129 3 1 1 3 73 110 116 1 255 130 0 1 1 1 1 88 1 4 0 0 0 5 255 130 1 2 0 5 255 130 1 2 0 5 255 130 1 2 0]

As seen the first Encode() generates lots of bytes plus the value for our Int value being [5 255 130 1 2 0], the second and third calls add the same [5 255 130 1 2 0] sequence.

But if you create 2 different gob.Encoders and you write the same values in the same order, they will produce exact results.

Note that in the previous statement "same order" is also important. Because type specification is transmitted when first value of such type is sent, sending values of different types in different order will transmit type specs in different order too, and so the references/identifiers of the types may differ, which implies that when a value of such type is encoded, different type reference/id will be used/sent.

Also note that the implementation of the gob package may change from release to release. These changes will be backward compatible (they must explicitly state if for some reason they would make backward incompatible changes), but being backward compatible does not mean the output is the same. So different Go versions may produce different results (but all is decodeable with all compatible versions).

Open answered 20/10, 2015 at 5:45 Comment(0)
L
4

It should probably be noted that the accepted answer is not correct: encoding/gob doesn't order map elements in a deterministic way: https://play.golang.org/p/Hh3_5Kb3Znn

I've forked encoding/gob and added some code to order maps by key before writing them to the stream. This will affect performance, but my particular application doesn't need high performance. Remember custom marshalers can break this, so use with care: https://github.com/dave/stablegob

Leesen answered 7/6, 2018 at 4:59 Comment(4)
This this a recent change or has it always been like this?Bluebeard
It seems like it was always like this?Bluebeard
Map iteration order (and I guess map item order in encoding/gob) has been fully non-deterministic since at least 2014 (go 1.3): github.com/golang/go/issues/6719 I assume the accepted answer didn't test using maps?Leesen
I posted an answer in 2015 mentioning the possibility of map iteration order, only to have it pointed out to me that the question specifically asks about structs and arrays and no maps, so I deleted my answer. I don't mind that, but it's worth pointing out that that's why the accepted answer isn't incorrect.Sharpfreeze
S
1

It also isn't deterministic if you use different types and different encoders.

Example:

package main

import (
    "bytes"
    "crypto/sha1"
    "encoding/gob"
    "encoding/hex"
    "log"
)

func main() {
    encint()
    encint64()
    encstring()

}

func encint() {
    s1 := []int{0, 2, 4, 5, 7}
    buf2 := bytes.Buffer{}
    enc2 := gob.NewEncoder(&buf2)
    enc2.Encode(s1)
}

func encint64() {
    s1 := []int64{0, 2, 4, 5, 7}
    buf2 := bytes.Buffer{}
    enc2 := gob.NewEncoder(&buf2)
    enc2.Encode(s1)
}

func encstring() {
    s1 := []string{"a", "b", "c", "d"}
    buf2 := bytes.Buffer{}
    enc2 := gob.NewEncoder(&buf2)
    enc2.Encode(s1)
    log.Println(buf2.Bytes())

    hash := sha1.New()
    hash.Write(buf2.Bytes())
    ret := hash.Sum(nil)
    log.Println(hex.EncodeToString(ret))
}

Run in Go Playground

Notice if you comment out encint() or encint64() the encstring will produce different bytes and a different hashcode.

This happens despite using different objects/pointers.

Sicilia answered 26/10, 2020 at 11:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.