Use Gob to write logs to a file in an append style

Asked 4/4, 2017 at 5:44 Answered 24/3, 2020 at 8:38

Solved serialization go encoding binary gob

Would it be possible to use Gob encoding for appending structs in series to the same file using append? It works for writing, but when reading with the decoder more than once I run into:

extra data in buffer

So I wonder if that's possible in the first place or whether I should use something like JSON to append JSON documents on a per line basis instead. Because the alternative would be to serialize a slice, but then again reading it as a whole would defeat the purpose of append.

Doc answered 4/4, 2017 at 5:44 Comment(1)

If you absolutely need to use gob I would suggest using boltdb. Write a hook for your logger and log them into boltdb. Also moving boltdb file around is easy since it's just a file - you can also create new boltdb files based on size or time or level. If you absolutely need plain files, I suggest using lumberjack and write a hook for your logger to transform entries to base64. Anyway using gob make analytical and monitoring hard, so InfluxDB or Prometheus are also valid options. – Surcharge 4/4, 2017 at 13:4

The gob package wasn't designed to be used this way. A gob stream has to be written by a single gob.Encoder, and it also has to be read by a single gob.Decoder.

The reason for this is because the gob package not only serializes the values you pass to it, it also transmits data to describe their types:

A stream of gobs is self-describing. Each data item in the stream is preceded by a specification of its type, expressed in terms of a small set of predefined types.

This is a state of the encoder / decoder–about what types and how they have been transmitted–, a subsequent new encoder / decoder will not (cannot) analyze the "preceeding" stream to reconstruct the same state and continue where a previous encoder / decoder left off.

Of course if you create a single gob.Encoder, you may use it to serialize as many values as you'd like to.

Also you can create a gob.Encoder and write to a file, and then later create a new gob.Encoder, and append to the same file, but you must use 2 gob.Decoders to read those values, exactly matching the encoding process.

As a demonstration, let's follow an example. This example will write to an in-memory buffer (bytes.Buffer). 2 subsequent encoders will write to it, then we will use 2 subsequent decoders to read the values. We'll write values of this struct:

type Point struct {
    X, Y int
}

For short, compact code, I use this "error handler" function:

func he(err error) {
    if err != nil {
        panic(err)
    }
}

And now the code:

const n, m = 3, 2
buf := &bytes.Buffer{}

e := gob.NewEncoder(buf)
for i := 0; i < n; i++ {
    he(e.Encode(&Point{X: i, Y: i * 2}))
}

e = gob.NewEncoder(buf)
for i := 0; i < m; i++ {
    he(e.Encode(&Point{X: i, Y: 10 + i}))
}

d := gob.NewDecoder(buf)
for i := 0; i < n; i++ {
    var p *Point
    he(d.Decode(&p))
    fmt.Println(p)
}

d = gob.NewDecoder(buf)
for i := 0; i < m; i++ {
    var p *Point
    he(d.Decode(&p))
    fmt.Println(p)
}

Output (try it on the Go Playground):

&{0 0}
&{1 2}
&{2 4}
&{0 10}
&{1 11}

Note that if we'd use only 1 decoder to read all the values (looping until i < n + m, we'd get the same error message you posted in your question when the iteration reaches n + 1, because the subsequent data is not a serialized Point, but the start of a new gob stream.

So if you want to stick with the gob package for doing what you want to do, you have to slightly modify, enhance your encoding / decoding process. You have to somehow mark the boundaries when a new encoder is used (so when decoding, you'll know you have to create a new decoder to read subsequent values).

You may use different techniques to achieve this:

You may write out a number, a count before you proceed to write values, and this number would tell how many values were written using the current encoder.
If you don't want to or can't tell how many values will be written with the current encoder, you may opt to write out a special end-of-encoder value when you don't write more values with the current encoder. When decoding, if you encounter this special end-of-encoder value, you'll know you have to create a new decoder to be able to read more values.

Some things to note here:

The gob package is most efficient, most compact if only a single encoder is used, because each time you create and use a new encoder, the type specifications will have to be re-transmitted, causing more overhead, and making the encoding / decoding process slower.
You can't seek in the data stream, you can only decode any value if you read the whole file from the beginning up until the value you want. Note that this somewhat applies even if you use other formats (such as JSON or XML).

If you want seeking functionality, you'd need to manage an index file separately, which would tell at which positions new encoders / decoders start, so you could seek to that position, create a new decoder, and start reading values from there.

Warning

gob.NewDecoder() documents that:

If r does not also implement io.ByteReader, it will be wrapped in a bufio.Reader.

This means that if you use os.File for example (it does not implement io.ByteReader), the internally used bufio.Reader might read more data from the passed reader than what gob.Decoder actually uses (as its name says, it does buffered IO). So using multiple decoders on the same input reader might result in decoding errors, as the internally used bufio.Reader of a previous decoder might read data that will not be used and passed on to the next decoder.

A solution / workaround to this is to explicitly pass a reader that implements io.ByteReader that does not read a buffer "ahead". For example:

type byteReader struct {
    io.Reader
    buf []byte
}

func (br byteReader) ReadByte() (byte, error) {
    if _, err := io.ReadFull(br, br.buf); err != nil {
        return 0, err
    }
    return br.buf[0], nil
}

func newByteReader(r io.Reader) byteReader {
    return byteReader{r, make([]byte, 1)}
}

See a faulty example without this wrapper: https://go.dev/play/p/dp1a4dMDmNc

And see how the above wrapper fixes the problem: https://go.dev/play/p/iw528FTFxmU

Check a related question: Efficient Go serialization of struct to disk

Pedicular answered 4/4, 2017 at 6:30 Comment(4)

if the data is homogeneous, then you can use my approach with the separation of the header from the file with encode through a mediocre structure – Excursive 24/3, 2020 at 12:55

The example doesn't work everytime. See this example: go.dev/play/p/dp1a4dMDmNc Where while one used several encoders in sequence, decoding only works if only one decoder is created. More so, things just work if the underlying writer/reader is a bytes buffer, but stop working on files :( – Correy 17/5, 2023 at 9:36

@Correy gob.NewDecoder() documents that: "If r does not also implement io.ByteReader, it will be wrapped in a bufio.Reader." This means the used bufio.Reader might read more data from the passed reader than what gob.Decoder actually uses. So using multiple decoders on the same reader might result in the error you experienced. In this case you must pass a reader that implements io.ByteReader which does not read unneeded data, and it will work, see here: go.dev/play/p/i6RZLI9q3Wx Edited the answer to write about this, thanks. – Pedicular 17/5, 2023 at 11:24

Omg, thank you @Icza, I was suspecting something like that (read ahead of some of some buffer reader), but I couldn't find it in the documentation -- I read all the preamble documentation for the package and for the Decoder object, but I didn't look into the docs of NewDecoder since it was only one line. Such an important piece of information -- that Decoder will perform buffering/read-ahead of the reader -- of something relatively magical (input is cast as something else, a ByteReader) should be documented with more emphasis (everywhere in the package). Alas ... – Correy 17/5, 2023 at 14:35

In addition to the above, I suggest using an intermediate structure to exclude the gob header:

package main

import (
    "bytes"
    "encoding/gob"
    "fmt"
    "io"
    "log"
)

type Point struct {
    X, Y int
}

func main() {
    buf := new(bytes.Buffer)
    enc, _, err := NewEncoderWithoutHeader(buf, new(Point))
    if err != nil {
        log.Fatal(err)
    }
    enc.Encode(&Point{10, 10})
    fmt.Println(buf.Bytes())
}


type HeaderSkiper struct {
    src io.Reader
    dst io.Writer
}

func (hs *HeaderSkiper) Read(p []byte) (int, error) {
    return hs.src.Read(p)
}

func (hs *HeaderSkiper) Write(p []byte) (int, error) {
    return hs.dst.Write(p)
}

func NewEncoderWithoutHeader(w io.Writer, sample interface{}) (*gob.Encoder, *bytes.Buffer, error) {
    hs := new(HeaderSkiper)
    hdr := new(bytes.Buffer)
    hs.dst = hdr

    enc := gob.NewEncoder(hs)
    // Write sample with header info
    if err := enc.Encode(sample); err != nil {
        return nil, nil, err
    }
    // Change writer
    hs.dst = w
    return enc, hdr, nil
}

func NewDecoderWithoutHeader(r io.Reader, hdr *bytes.Buffer, dummy interface{}) (*gob.Decoder, error) {
    hs := new(HeaderSkiper)
    hs.src = hdr

    dec := gob.NewDecoder(hs)
    if err := dec.Decode(dummy); err != nil {
        return nil, err
    }

    hs.src = r
    return dec, nil
}

Excursive answered 24/3, 2020 at 8:38 Comment(0)

Additionally to great icza answer, you could use the following trick to append to a gob file with already written data: when append the first time write and discard the first encode:

Create the file Encode gob as usual (first encode write headers)
Close file
Open file for append
Using and intermediate writer encode dummy struct (which write headers)
Reset the writer
Encode gob as usual (writes no headers)

Example:

package main

import (
    "bytes"
    "encoding/gob"
    "fmt"
    "io"
    "io/ioutil"
    "log"
    "os"
)

type Record struct {
    ID   int
    Body string
}

func main() {
    r1 := Record{ID: 1, Body: "abc"}
    r2 := Record{ID: 2, Body: "def"}

    // encode r1
    var buf1 bytes.Buffer
    enc := gob.NewEncoder(&buf1)
    err := enc.Encode(r1)
    if err != nil {
        log.Fatal(err)
    }

    // write to file
    err = ioutil.WriteFile("/tmp/log.gob", buf1.Bytes(), 0600)
    if err != nil {
        log.Fatal()
    }

    // encode dummy (which write headers)
    var buf2 bytes.Buffer
    enc = gob.NewEncoder(&buf2)
    err = enc.Encode(Record{})
    if err != nil {
        log.Fatal(err)
    }

    // remove dummy
    buf2.Reset()

    // encode r2
    err = enc.Encode(r2)
    if err != nil {
        log.Fatal(err)
    }

    // open file
    f, err := os.OpenFile("/tmp/log.gob", os.O_WRONLY|os.O_APPEND, 0600)
    if err != nil {
        log.Fatal(err)
    }

    // write r2
    _, err = f.Write(buf2.Bytes())
    if err != nil {
        log.Fatal(err)
    }

    // decode file
    data, err := ioutil.ReadFile("/tmp/log.gob")
    if err != nil {
        log.Fatal(err)
    }

    var r Record
    dec := gob.NewDecoder(bytes.NewReader(data))
    for {
        err = dec.Decode(&r)
        if err == io.EOF {
            break
        }
        if err != nil {
            log.Fatal(err)
        }
        fmt.Println(r)
    }
}

Pleochroism answered 19/12, 2019 at 12:24 Comment(1)

I have slightly reduced your option in my answer using a separate structure – Excursive 24/3, 2020 at 8:41

Warning

Recommended topics

Hot tags