Avro serialization: which parts are and aren't thread-safe?
Asked Answered
S

1

6

I am seeing some conflicting information about this in different places online, so would appreciate and authoritative answer from someone, who actually knows.

Suppose, I am serializing some stuff to avro:

    class StuffToAvro {
       private final Schema schema;
       StuffToAvro(Schema schema) { this.schema = schema }

       void apply(GenericRecord stuff, OutputStream out) {
         final Encoder encoder = EncoderFactory.get.binaryEncoder(out, null);
         final GenericDatumWriter writer = new GenericDatumWriter(schema);
         writer.write(stuff, encoder):
       }
    }

The question is whether I can/should optimize it by reusing the encoder and writer, and, if I should, what is the right way to do it: can I just initialize the writer upfront and make it final for example, or does it need to be a ThreadLocal?

A similar question about encoder: should I remember the previous instance and pass it to getBinaryEncoder to reuse, or does that need be a ThreadLocal as well.

In each case, if the answer is ThreadLocal, I'd also like to know whether such optimization is worth the complexity: is it actually expensive to create a brand new writer and/or encoder every time rather than reusing them?

Also, I assume, that whatever answers I get here, also apply to reading/decoding as well. Is that right?

Appreciate any pointers.

Thank you!

Subcartilaginous answered 3/5, 2017 at 12:55 Comment(0)
C
4

Per this post

Yes, a DatumReader instance may be used in multiple threads. Encoder and Decoder are not thread-safe, but DatumReader and DatumWriter are.

Writers are thread-safe too.

Yes, re-using a single GenericDatumWriter to write multiple objects should improve performance.

Cache answered 23/2, 2019 at 15:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.