use of "default" in avro schema
Asked Answered
F

3

7

As per the definition of "default" attribute in Avro docs: "A default value for this field, used when reading instances that lack this field (optional)."

This means that if the corresponding field is missing, the default value is taken.

But this does not seem to be the case. Consider the following student schema:

{
        "type": "record",
        "namespace": "com.example",
        "name": "Student",
        "fields": [{
                "name": "age",
                "type": "int",
                "default": -1
            },
            {
                "name": "name",
                "type": "string",
                "default": "null"
            }
        ]
    }

Schema says that: if "age" field is missing, then consider value as -1. Likewise for "name" field.

Now, if I try to construct Student model, from the following JSON:

{"age":70}

I get this exception:

org.apache.avro.AvroTypeException: Expected string. Got END_OBJECT

    at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:698)
    at org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:227)

Looks like the default is NOT working as expected. So, What exactly is the role of default here ?

This is the code used to generate Student model:

Decoder decoder = DecoderFactory.get().jsonDecoder(Student.SCHEMA$, studentJson);
SpecificDatumReader<Student> datumReader = new SpecificDatumReader<>(Student.class);
return datumReader.read(null, decoder);

(Student class is auto-generated by Avro compiler from student schema)

Flitting answered 26/2, 2018 at 9:59 Comment(2)
Possible duplicate of Avro field default valuesObstinate
@Generic there is little difference. There Model is built using builder and having default works. While only during parsing Json string it fails. Few articles pointed out that fields cannot go missing, which I felt unjustified. If at all we have to have field, then I do not understand how default attribute will help.Flitting
V
4

I think there is some miss understanding around default values so hopefully my explanation will help to other people as well. The default value is useful to give a default value when the field is not present, but this is essentially when you are instancing an avro object (in your case calling datumReader.read) but it does not allow read data with a different schema, this is why the concept of "schema registry" is useful for this kind of situations.

The following code works and allow read your data

Decoder decoder = DecoderFactory.get().jsonDecoder(Student.SCHEMA$, "{\"age\":70}");
SpecificDatumReader<Student> datumReader = new SpecificDatumReader<>(Student.class);

Schema expected = new Schema.Parser().parse("{\n" +
        "  \"type\": \"record\",\n" +
        "  \"namespace\": \"com.example\",\n" +
        "  \"name\": \"Student\",\n" +
        "  \"fields\": [{\n" +
        "    \"name\": \"age\",\n" +
        "    \"type\": \"int\",\n" +
        "    \"default\": -1\n" +
        "  }\n" +
        "  ]\n" +
        "}");

datumReader.setSchema(expected);
System.out.println(datumReader.read(null, decoder));

as you can see, I am specifying the schema used to "write" the json input which does not contain the field "name", however (considering your schema contains a default value) when you print the records you will see the name with your default value

{"age": 70, "name": "null"}

Just in case, might or might not already know, that "null" is not really a null value is a string with value "null".

Vivianna answered 26/2, 2018 at 16:9 Comment(2)
datumReader.setSchema(expected) works for missing field. But unfortunately when input json does contains "name" field, it still set the value to "null". ie if input is {"age": 70, "name": "john"}, I will get Student model with name set to "null". Where as I expecting that to be set to "john". Is there no other way to workaround these missing fields ??Flitting
2 options. send the writer schema as part of the message (expensive) or use schema registryVivianna
T
1

Just to add what is already said in above answer. in order for a field to be null if not present. then union its type with null. otherwise its just a string which is spelled as null that gets in.example schema:

{
"name": "name",
"type": [
  "null",
  "string"
],
"default": null

}

and then if you add {"age":70} and retrieve the record, you will get below:

{"age":70,"name":null}
Tiling answered 2/8, 2021 at 7:18 Comment(0)
U
0

default fields are for the reader. The writer must supply all fields.

From the documentation

default: A default value for this field, only used when reading instances that lack the field for schema evolution purposes. The presence of a default value does not make the field optional at encoding time

Umpire answered 17/4 at 22:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.