Is it possible to have an optional field in an Avro schema (i.e. the field does not appear at all in the .json file)?
Asked Answered
C

4

48

Is it possible to have an optional field in an Avro schema (i.e. the field does not appear at all in the .JSON file)?

In my Avro schema, I have two fields:

{"name": "author", "type": ["null", "string"], "default": null},
{"name": "importance", "type": ["null", "string"], "default": null},

And in my JSON files those two fields can exist or not.

However, when they do not exist, I receive an error (e.g. when I test such a JSON file using avro-tools command line client):

Expected field name not found: author

I understand that as long as the field name exists in a JSON, it can be null, or a string value, but what I'm trying to express is something like "this JSON is valid if the those field names do not exist, OR if they exist and they are null or string".

Is this possible to express in an Avro schema? If so, how?

Cresset answered 27/3, 2015 at 11:25 Comment(4)
I faced to the same problem. Have you found solution?Formenti
@Formenti unfortunately no. I still can't express a totally optional JSON field using Avro schema.Maxama
I've struggled with this too. See #45194229Dy
you may want to add some code on how you are parsing from json to avroNegrito
S
49

you can define the default attribute as undefined example. so the field can be skipped.

{ 
   "name": "first_name",
   "type": "string",
   "default": "undefined"
},

Also all field are manadatory in avro. if you want it to be optional, then union its type with null. example:

{
    "name": "username",
    "type": [
      "null",
      "string"
    ],
    "default": null
},
Sigismond answered 21/8, 2019 at 6:17 Comment(1)
Isn't OP using exactly the same as in this answer?Hesiod
C
15

According to avro specification this is possible, using the default attribute.

See https://avro.apache.org/docs/1.8.2/spec.html

default: A default value for this field, used when reading instances that lack this field (optional). Permitted values depend on the field's schema type, according to the table below. Default values for union fields correspond to the first schema in the union.

At the example you gave, you do add the default attribute with value "null", so this should work. However, supporting this depends also on the library you use for reading the avro message (there are libraries at c,c++,python,java,c#,ruby etc.). Maybe (probably) the library you use lack this feature.

Celia answered 5/9, 2018 at 20:2 Comment(0)
T
1

Are you providing a the type ("null" or "string") as a key in the object to be serialized, or just trying to serialize a bare object?

Avro implements tagged unions and will not perform type inference to decide which type an object is. This means that you have to provide a type tag.

I am testing with Node and avro-js. The following works:

const avro = require( "avro-js" );
const schema = {
    type: "record", name: "test", fields: [
        {
            "name": "author", "type": ["null", "string"],
            "default": null
        },
        {
            "name": "importance", "type": ["null", "string"],
            "default": null
        },
    ]
};
const s = avro.parse( schema );
s.toBuffer( {
    author: { null: null },
    importance: { null: null }
} ).toString();
// '\x00\x00'
s.toBuffer( {
    author: { string: 'Homer' },
    importance: { string: '1' }
} ).toString();
// '\x02\nHomer\x02\x021'

I find that I can serialize an empty object because default values are provided:

s.toBuffer( {} ).toString();
// '\x00\x00'

However, this may be implementation-specific. Can you provide reproduction instructions so we can help further?

Tiannatiara answered 10/10, 2023 at 16:45 Comment(1)
Thank you! author: { string: 'Homer' } solved my issue. It seems when you declare the union type with avro-js, you have to pass in an object with a key corresponding to the type. The OP might have tried to serialize author: 'Homer' which leads to an issue.Tortoiseshell
H
1

Using the default attribute with null value or union type [null, orignal_type].

  • value undefined is not supported -> docs

In case of object it should look like this:

const avro = require('avsc');

const yourSchema = avro.Type.forSchema({
  type: 'record',
  name: 'parent_record',
  fields: [
    { name: 'field_1', type: ['null', 'string'], default: null },
    { 
      name: 'optional_object_type', 
      type: ['null', {
        type: 'record',
        name: 'optional_record',
        fields: [{ name: 'sub_field', type: 'string' }]
      }],
      default:null
    }
  ]
});
Halves answered 29/11, 2023 at 10:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.