Avro field default values
Asked Answered
M

4

75

I am running into some issues setting up default values for Avro fields. I have a simple schema as given below:

data.avsc:

{
 "namespace":"test",
 "type":"record",
 "name":"Data",
 "fields":[
    { "name": "id", "type": [ "long", "null" ] },
    { "name": "value", "type": [ "string", "null" ] },
    { "name": "raw", "type": [ "bytes", "null" ] }
 ]
}

I am using the avro-maven-plugin v1.7.6 to generate the Java model.

When I create an instance of the model using: Data data = Data.newBuilder().build();, it fails with an exception:

org.apache.avro.AvroRuntimeException: org.apache.avro.AvroRuntimeException: Field id type:UNION pos:0 not set and has no default value.

But if I specify the "default" property,

{ "name": "id", "type": [ "long", "null" ], "default": "null" },

I do not get this error. I read in the documentation that first schema in the union becomes the default schema. So my question is, why do I still need to specify the "default" property? How else do I make a field optional?

And if I do need to specify the default values, how does that work for a union; do I need to specify default values for each schema in the union and how does that work in terms of order/syntax?

Thanks.

Multiplechoice answered 8/4, 2014 at 13:10 Comment(0)
G
106

The default value of a union corresponds to the first schema of the union (Source). Your union is defined as ["long", "null"] therefor the default value must be a long number. null is not a long number that is why you are getting an error.

If you still want to define null as a default value then put null schema first, i.e. change the union to ["null", "long"] instead.

Gratify answered 30/4, 2014 at 12:2 Comment(4)
Simply putting null first in the type union does not make it optional it seems - that is how I had it and still got the error. Adding default null is required at least on Avro 1.7.5 which I'm using.Hutchings
Check this: bytepadding.com/big-data/spark/avro/…. Has a complete schema defined and field rules table. Covers every complex type being defaulted. Only missing 'record' field type, but applies the same: {"type": ["null", {"name": "", "type":"record", "fields":[]}]}Baxter
This is a very surprising behavior from Avro. I wish the error was more explicit about thisPuck
Simply putting null first in the type union does not make it optional it seems - that is how I had it and still got the error. Adding default null is required at least on Avro 1.7.5 which I'm using. This is correct, and by design. A union {null, long} myField is still a mandatory field, meaning that it needs to be set explicitly. The only difference with a long myField is that the latter can never get the value null.Martz
C
53

Its a bug at Avro's end which is marked as Not A Problem . You need to add default attribute to mention default value.

{"name": "xxx", "type": ["null", "boolean"], "default": null}

Please refer AVRO-1803.

Catfall answered 12/7, 2017 at 10:6 Comment(3)
For numeric values the default value must be null but when a string type it must be "null" - i.e "default": null} and "default": "null"}Cinchona
what is the default value for bytes?Cruce
I would not call this a bug in Avro. You could argue that it is bad design, and it's definitely not intuitive to some people, but it's an intentional feature which is consistent with other rules in Avro. There is a conceptual and byte level difference between "this can be null" and "this can be null, and can be entirely missing, in which case it is assumed to be null". Avro's standard just requires that difference to be explicitly expressed in the schema instead of defining implicit defaults.Modiolus
P
22

You must provide "default": null not "default": "null" in the schema to get the builder method working

Pavla answered 19/7, 2018 at 20:58 Comment(1)
That's not the case for all types, such as string.Alignment
T
4

I think the problem is that you are using builder,

According to the documentation of the Java API:

using a builder requires setting all fields, even if they are null

Theoretical answered 18/3, 2021 at 17:20 Comment(1)
In my opinion this is what really answers the OP's question. The default value is ignored by Avro when encoding. What's happening is that the builder is allowing you to skip setting fields that have a default value. The default value is of no use to make a field optional. The following is enough: { "name": "id", "type": [ "long", "null" ]}Gummite

© 2022 - 2024 — McMap. All rights reserved.