ISO 8601 dates in Avro schema
Asked Answered
L

2

7

Is it possible to use date-time fields such as "2019-08-24T14:15:22.000Z" in Avro?

The docs says that one needs to use type int/long with logical type for dates/timestamps. But in this case, you need to have your date as an epoch timestamp.

I'm looking for sth like this:

{
    "name": "myDateField",
    "type": "string",
    "logicalType": "timestamp-micros"
}

but it seems that logicalType is ignored in this case and it becomes possible to set any random string in that field.

Lederer answered 19/11, 2021 at 13:20 Comment(2)
What are you trying to do? Which language are you using? Of course you can put any value in a string, the only option is to check during (de-)serialisation and reject invalid values. If you don't need the time zone, why is an epoch timestamp not sufficient?Newark
@Newark I think it's not really related but I need a) to send messages to the topic from the java/spring app and b) to be able to send a message manually to the broker (e.g. via Conduktor). In both cases, it should be validated. Ofc my consumer app validates incorrect data too but it's better to reject it earlier while being sent to the topic. I've been wondering if Avro supports checking these kinds of date-strings (ISO 8601 format is obviously the most popular nowadays). And ofc I need to preserve timezones too (I just have Z in the example)Lederer
C
1

The idea of the logical types is that the library you are using will do the conversion for you.

Assume you had a schema like this:

{
    "type": "record",
    "name": "root",
    "fields": [
        {
            "name": "mydate",
            "type": {
                "type": "int",
                "logicalType": "date",
            },
        },
    ]
}

If you wanted to use this schema in Python (for example), you would create a record like so:

from datetime import date
record = {"mydate": date(2021, 11, 19)}

The avro library you are using is responsible for taking the date object, doing the conversion to figure out how to represent it correctly as the underlying int type, and then serializing it as an int.

Likewise, when reading that record back out, the library is responsible for first converting the underlying int back into the date object. From a user perspective, you don't have to worry about the conversion and simply get to use higher level types.

Cysteine answered 19/11, 2021 at 17:48 Comment(2)
What if I receive an iso date in a request or fetch it from a database? I think these scenarios are quite popular in the enterpriseLederer
Sorry, I'm not sure how those situations are handled as I have not done that.Cysteine
N
0

Assuming you have a simple Pojo:

public class AvroEvent {
  public ZonedDateTime time;
}

You could use an Avro logical type conversion:

public class ZonedDateTimeConversion extends Conversion<ZonedDateTime> {
  public Class<ZonedDateTime> getConvertedType() {
    return ZonedDateTime.class;
  }

  public String getLogicalTypeName() {
    return "zoneddatetime-string";
  }

  public Schema getRecommendedSchema() {
    return new ZonedDateTimeString().addToSchema(Schema.create(Schema.Type.STRING));
  }

  public ZonedDateTime fromCharSequence(CharSequence value, Schema schema, LogicalType type) {
    return ZonedDateTime.parse(value, DateTimeFormatter.ISO_ZONED_DATE_TIME);
  }

  public CharSequence toCharSequence(ZonedDateTime value, Schema schema, LogicalType type) {
    return value.format(DateTimeFormatter.ISO_ZONED_DATE_TIME);
  }

  public static class ZonedDateTimeString extends LogicalType {
    private ZonedDateTimeString() {
      super("zoneddatetime-string");
    }

    public void validate(Schema schema) {
      super.validate(schema);
      if (schema.getType() != Schema.Type.STRING) {
        throw new IllegalArgumentException(
            "ZonedDateTime (string) can only be used with an underlying string type");
      }
    }
  }
}

And add that to an Avro model to use it for serialising and deserializing your Pojo:

    var model = new ReflectData();
    model.addLogicalTypeConversion(new ZonedDateTimeConversion());

    var schema = model.getSchema(AvroEvent.class);

    var encoder = new BinaryMessageEncoder<AvroEvent>(model, schema);
    
    var data = encoder.encode(...);

So you can only write valid valid to the serialisation and throw an exception when deserializing an invalid time.

See https://github.com/fillmore-labs/avro-logical-type-conversion for a running example.

Newark answered 23/11, 2021 at 22:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.