Is there an API implementation of Avro's "duration" logical type?
Asked Answered
C

2

7

The current Apache Avro (1.8.2) documentation mentions a "duration" logical type:

A duration logical type annotates Avro fixed type of size 12, which stores three little-endian unsigned integers that represent durations at different granularities of time. The first stores a number in months, the second stores a number in days, and the third stores a number in milliseconds.

While this all makes sense, I can't find an actual implementation in either the .Net or Java libraries. The documentation for logical types clearly lists every logical type except duration (date, time-millis, time-micros, timestamp-millis and timestamp-micros).

The "duration" is defined in my Avro schema accordingly:

{
    "type": "record",
    "name": "DataBlock",
    "fields": [
    {
        "name": "duration",
        "type": {
            "type": "fixed",
            "name": "DataBlockDuration",
            "size": 12
        }
    }]
}

In .Net (excuse the VB), I have to manually serialise durations:

Dim ret(11) As Byte
Dim months = BitConverter.GetBytes(duration.Months)
Dim days = BitConverter.GetBytes(duration.Days)
Dim milliseconds = BitConverter.GetBytes(duration.Milliseconds)

Array.Copy(months, 0, ret, 0, 4)
Array.Copy(days, 0, ret, 4, 4)
Array.Copy(milliseconds, 0, ret, 8, 4)

When deserialising in Java, I have to convert to org.joda.time.Period by doing this:

IntBuffer buf = ByteBuffer
                  .wrap(dataBlock.getDuration().bytes())
                  .order(ByteOrder.LITTLE_ENDIAN)
                  .asIntBuffer();

Period period = Period
                  .months(buf.get(0))
                  .withDays(buf.get(1))
                  .withMillis(buf.get(2));

Am I missing something, or did the Avro team write a spec and forget to implement it? It seems that this data type in particular has to be implemented without any help from the Avro API at all.

Catacomb answered 24/4, 2018 at 1:42 Comment(2)
can share your schema and runnable code?Hydrophilous
I've added the relevant section of my schema. I would have added "logicalType: duration" to it, but the .Net Avro API fails to serialise the schema correctly if I do ("duration" doesn't appear wrapped in quotes). avro-tools appears to generate the same java class whether logicaltype is specified in the avsc or not.Catacomb
E
2

Joda-Time

The Joda-Time project is now in maintenance mode, with the team advising migration to the java.time classes. Concepts are similar, as both projects were led by the same man, Stephen Colebourne.

java.time

The java.time framework offers two separate classes to represent a span of time unattached to the timeline:

  • Period
    A number of years, months, and days.
  • Duration
    A number of days (generic 24-hour chunks of time unrelated to the calendar), hours, minutes, seconds, and a fractional second (nanoseconds).

You could use your first two numbers as a Period, and the third number for a Duration.

Period p = Period.ofMonths( months ).plusDays( days ) ;
Duration d = Duration.ofMillis( millis ) ;

You might want to normalize the years & months of the Period object. For example, a period of "15 months" will be normalized to "1 year and 3 months".

Period p = Period.ofMonths( months ).plusDays( days ).normalized() ;

ISO 8601

The java.time classes use standard ISO 8601 standard formats when parsing/generating strings.

For a period or duration, that means using the PnYnMnDTnHnMnS format. The P marks the beginning, and the T separates any years-months-days from any hours-minutes-seconds. For example, "P3Y6M4DT12H30M5S" represents a duration of "three years, six months, four days, twelve hours, thirty minutes, and five seconds".

To generate such a string, simply call toString on a Period or Duration. To parse, call parse.

Odd concepts in Avro

That Avro concept of duration (months + days + milliseconds) seems quite odd to me. The biggest problem is that mixing years-months-days with hours-minutes-seconds rarely makes any practical sense (think about it). And tracking months but not years is surprising.

org.threeten.extra.PeriodDuration

If you insist on wanting to merge the years-months-days with hours-minutes-seconds, consider adding the ThreeTen-Extra library to your project. It offers a PeriodDuration class.

PeriodDuration pd = PeriodDuration.of( p , d ) ;  // Pass `Period` and `Duration` objects as covered above.

Again, you will likely want to call normalizedStandardDays and normalizedYears.


About java.time

The java.time framework is built into Java 8 and later. These classes supplant the troublesome old legacy date-time classes such as java.util.Date, Calendar, & SimpleDateFormat.

The Joda-Time project, now in maintenance mode, advises migration to the java.time classes.

To learn more, see the Oracle Tutorial. And search Stack Overflow for many examples and explanations. Specification is JSR 310.

You may exchange java.time objects directly with your database. Use a JDBC driver compliant with JDBC 4.2 or later. No need for strings, no need for java.sql.* classes.

Where to obtain the java.time classes?

The ThreeTen-Extra project extends java.time with additional classes. This project is a proving ground for possible future additions to java.time. You may find some useful classes here such as Interval, YearWeek, YearQuarter, and more.

Entomologize answered 24/4, 2018 at 3:53 Comment(2)
Thanks for that. I noticed that java.time was two separate classes, but I did not realise that joda-time is effectively deprecated. I somewhat liked the single "Period" class, as it offers a complete mapping to the ISO8601 duration. en.wikipedia.org/wiki/ISO_8601#Durations I believe tracking months but not years makes sense, as a month can have between 28 and 31 days. Years are always 12 months. Days can be between 23 and 25 hours (thanks, DST). "Months+Days+Seconds" scales to represent any variable period.Catacomb
You should be aware that all of these period/duration classes represent a value not attached to the timeline or calendar. So "3 months" is three generic months of some arbitrary length – not something like “28, 31, 30”. And days are generic 24-hour chunks of time. By the way, days can be other lengths too, in real life, not just 23, 24, and 25 hours because of reasons beyond DST such as historic transitions to modern time-tracking as well as meddlesome politicians in contemporary times.Entomologize
M
2

According to the Apache issue tracker AVRO-2123, the logical duration type has been specified but not yet implemented.

So yes, the Apache team has written the spec but forgotten to implement it in this detail.

I have also searched the unzipped jar-file in the Avro-version 1.8.2 for any import of joda-library and only found the class org.apache.avro.data.TimeConversions which obtains some conversions for other logical types like "date" (mapped to org.joda.time.LocalDate) etc. but not for the Joda-class Period.

It seems your way to circumvent the problem by using the Period-class of Joda is good because:

  • Avro still uses Joda-Time (although latter one is in maintenance mode),
  • the Period-class can completely map the Avro-spec for duration in months, days and milliseconds (and using unsigned ints as required by Avro spec for an always positive duration is also a good thing for avoiding odd periods with mixed signs).

Possible alternatives for Joda-Time which I am aware of:

  • Threeten-Extra-class PeriodDuration (see the answer of Basil Bourque)
  • Time4J-class net.time4j.Duration (my lib)

The Threeten-Extra-class has less features (no localization at all, reduced ISO-8601-compliance etc) than the Joda-class but might still be enough for you in your special Avro-related scenario while the Time4J-class has even more features than Joda to offer (on the areas of ISO-compliance, formatting, parsing, normalizing etc).

Mister answered 24/4, 2018 at 11:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.