How to serialize a Date using AVRO in Java
Asked Answered
P

3

18

I'm actually trying to serialize objects containing dates with Avro, and the deserialized date doesn't match the expected value (tested with avro 1.7.2 and 1.7.1). Here's the class I'm serializing :

import java.text.SimpleDateFormat;
import java.util.Date;

public class Dummy {
    private Date date;
    private SimpleDateFormat df = new SimpleDateFormat("dd/MM/yyyy hh:mm:ss.SSS");

    public Dummy() {
    }

    public void setDate(Date date) {
        this.date = date;
    }

    public Date getDate() {
        return date;
    }

    @Override
    public String toString() {
        return df.format(date);
    }
}

The code used to serialize / deserialize :

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.Date;

import org.apache.avro.Schema;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.io.Decoder;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;
import org.apache.avro.reflect.ReflectData;
import org.apache.avro.reflect.ReflectDatumReader;
import org.apache.avro.reflect.ReflectDatumWriter;

public class AvroSerialization {

    public static void main(String[] args) {
        Dummy expected = new Dummy();
        expected.setDate(new Date());
        System.out.println("EXPECTED: " + expected);
        Schema schema = ReflectData.get().getSchema(Dummy.class);
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        Encoder encoder = EncoderFactory.get().binaryEncoder(baos, null);
        DatumWriter<Dummy> writer = new ReflectDatumWriter<Dummy>(schema);
        try {
            writer.write(expected, encoder);
            encoder.flush();
            Decoder decoder = DecoderFactory.get().binaryDecoder(baos.toByteArray(), null);
            DatumReader<Dummy> reader = new ReflectDatumReader<Dummy>(schema);
            Dummy actual = reader.read(null, decoder);
            System.out.println("ACTUAL: " + actual);
        } catch (IOException e) {
            System.err.println("IOException: " + e.getMessage());
        }
    }
}

And the output :

EXPECTED: 06/11/2012 05:43:29.188
ACTUAL: 06/11/2012 05:43:29.387

Is it related to a known bug, or is it related to the way I'm serializing the object ?

Peterpeterborough answered 6/11, 2012 at 16:47 Comment(2)
I know I'm not answering your question, but I wouldn't use a static SimpleDateFormat. It's not a thread-safe class and consequently will give you unreliable results in a threaded environmentNoseband
Thank you for the comment, this is actually not a production code, but only a test class I developed in order to expose my problem. Anyway you're right, so I removed the static modifier ;)Peterpeterborough
D
7

I think AVRO doesn't serialize date at this point. What I would do is to wrap it in another class and store at as a long (date.gettime()) while avro folks add this feature. And the reason that you see different Date values is that every time that you (and avro) create a Date object, it initializes the Date with the current System time.

Digamma answered 20/11, 2012 at 0:37 Comment(1)
Thank you amas, it looks like Date are actually not supported as stated in your answer, and that the Date is actually initialized with the current System time.Peterpeterborough
W
33

Avro 1.8 now has a date "logicalType", which annotates int. For example:

{"name": "date", "type": "int", "logicalType": "date"}

Quoting the spec:

A date logical type annotates an Avro int, where the int stores the number of days from the unix epoch, 1 January 1970 (ISO calendar).

Wardieu answered 10/10, 2016 at 18:7 Comment(5)
Why int and not long?Gerber
@AlessandroDionisi the spec calls for date because this particular type is the number of DAYS since the epoch. There is another logicalType "timestamp-millis" for the more common number of milliseconds and it is a long.Aconite
That's a con of AVRO, because after deserialisation, if you have your date in the JSON, it would appear as- { date: 2020-08-10 } whereas it is illegal in JSON to store date with hyphens without quotes. Expected value would be - {date: "2020-08-10"} which means the type should be string,not int.Glandule
Who uses number of days after unix epoch? Why not millis as default?Arlynearlynne
@Arlynearlynne Because you're storing a date, not a datetime. If you use number of millis from epoch, you have 86,400,000 values that lie within each day. i.e. your representation of the date is not unique. It's also wasteful - if you record the number of millis, you can hold a far smaller date range before you have to use longs for storage.Autotoxin
D
7

I think AVRO doesn't serialize date at this point. What I would do is to wrap it in another class and store at as a long (date.gettime()) while avro folks add this feature. And the reason that you see different Date values is that every time that you (and avro) create a Date object, it initializes the Date with the current System time.

Digamma answered 20/11, 2012 at 0:37 Comment(1)
Thank you amas, it looks like Date are actually not supported as stated in your answer, and that the Date is actually initialized with the current System time.Peterpeterborough
R
1

Issue Resolved at my end. I was also facing the same issue with serialisation of date field. Though Avro takes Date type as int. However once you declare it in this way

{ "name": "DateEffective", "type": [ "null", { "type": "int", "logicalType": "date" } ],

and run avrogen -s command than it makes C#/Java model class with date field.

Rudolfrudolfo answered 14/7, 2022 at 8:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.