Recursive data type like a tree as Avro schema
Asked Answered
O

2

8

Reading https://avro.apache.org/docs/current/spec.html it says a schema must be one of:

  • A JSON string, naming a defined type.
  • A JSON object, of the form: {"type": "typeName" ...attributes...} where typeName is either a primitive or derived type name, as defined below. Attributes not defined in this document are permitted as metadata, but must not affect the format of serialized data.
  • A JSON array, representing a union of embedded types.

I want a schema that describes a tree, using the recursive definition that a tree is either:

  • A node with a value (say, integer) and a list of trees (the children)
  • A leaf with a value

My initial attempt looked like:

{
  "name": "Tree",
  "type": [
    {
      "name": "Node",
      "type": "record",
      "fields": [
        {
          "name": "value",
          "type": "long"
        },
        {
          "name": "children",
          "type": { "type": "array", "items": "Tree" }
        }
      ]
    },
    {
      "name": "Leaf",
      "type": "record",
      "fields": [
        {
          "name": "value",
          "type": "long"
        }
      ]
    }
  ]
}

But the Avro compiler rejects this, complaining there is nothing of type {"name":"Tree","type":[{"name":"Node".... It seems Avro doesn't like the union type at the top-level. I'm guessing this falls under the aforementioned rule "a schema must be one of .. a JSON object .. where typeName is either a primitive or derived type name." I am not sure what a "derived type name" is though. At first I thought it was the same as a "complex type" but that includes union types..

Anyways, changing it to the more convoluted definition:

{
  "name": "Tree",
  "type": "record",
  "fields": [{
    "name": "ctors",
    "type": [
      {
        "name": "Node",
        "type": "record",
        "fields": [
          {
            "name": "value",
            "type": "long"
          },
          {
            "name": "children",
            "type": { "type": "array", "items": "Tree" }
          }
        ]
      },
      {
        "name": "Leaf",
        "type": "record",
        "fields": [
          {
            "name": "value",
            "type": "long"
          }
        ]
      }
    ]
  }]
}

works, but now I have this weird record with just a single field whose sole purpose is to let me define the top-level union type I want.

Is this the only way to get what I want in Avro or is there a better way?

Thanks!

Overact answered 19/10, 2017 at 21:51 Comment(0)
C
8

While this is not an answer to the actual question about representing a recursive named union (which isn't possible as of late 2022), it is possible to work around this for a tree-like data structure.

If you represent a Tree as a node, and a Leaf as a node with an empty list of children, then one recursive type is sufficient:

{
  "type": "record",
  "name": "TreeNode",
  "fields": [
    {
      "name": "value",
      "type": "long"
    },
    {
      "name": "children",
      "type": { "type": "array", "items": "TreeNode" }
    }
  ]
}

Now, your three types Tree, Node, and Leaf are unified into one type TreeNode, and there is no union of Node and Leaf necessary.

Chevalier answered 9/5, 2019 at 18:45 Comment(0)
M
3

I just stumbled uppon the same problem wanting to define a recursive union. I'm quite pessimistic about a cleaner solution than your convoluted one, because there is currently no way to name an union, and hence no way to recursively refer to it while constructing it, see this open ticket

Magnetic answered 3/1, 2018 at 11:20 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.