Reading https://avro.apache.org/docs/current/spec.html it says a schema must be one of:
- A JSON string, naming a defined type.
- A JSON object, of the form:
{"type": "typeName" ...attributes...}
wheretypeName
is either a primitive or derived type name, as defined below. Attributes not defined in this document are permitted as metadata, but must not affect the format of serialized data. - A JSON array, representing a union of embedded types.
I want a schema that describes a tree, using the recursive definition that a tree is either:
- A node with a value (say, integer) and a list of trees (the children)
- A leaf with a value
My initial attempt looked like:
{
"name": "Tree",
"type": [
{
"name": "Node",
"type": "record",
"fields": [
{
"name": "value",
"type": "long"
},
{
"name": "children",
"type": { "type": "array", "items": "Tree" }
}
]
},
{
"name": "Leaf",
"type": "record",
"fields": [
{
"name": "value",
"type": "long"
}
]
}
]
}
But the Avro compiler rejects this, complaining there is nothing of type {"name":"Tree","type":[{"name":"Node"...
. It seems Avro doesn't like the union type at the top-level. I'm guessing this falls under the aforementioned rule "a schema must be one of .. a JSON object .. where typeName is either a primitive or derived type name." I am not sure what a "derived type name" is though. At first I thought it was the same as a "complex type" but that includes union types..
Anyways, changing it to the more convoluted definition:
{
"name": "Tree",
"type": "record",
"fields": [{
"name": "ctors",
"type": [
{
"name": "Node",
"type": "record",
"fields": [
{
"name": "value",
"type": "long"
},
{
"name": "children",
"type": { "type": "array", "items": "Tree" }
}
]
},
{
"name": "Leaf",
"type": "record",
"fields": [
{
"name": "value",
"type": "long"
}
]
}
]
}]
}
works, but now I have this weird record with just a single field whose sole purpose is to let me define the top-level union type I want.
Is this the only way to get what I want in Avro or is there a better way?
Thanks!