Nesting Avro schemas
Asked Answered
R

3

17

According to this question on nesting Avro schemas, the right way to nest a record schema is as follows:

{
    "name": "person",
    "type": "record",
    "fields": [
        {"name": "firstname", "type": "string"},
        {"name": "lastname", "type": "string"},
        {
            "name": "address",
            "type": {
                        "type" : "record",
                        "name" : "AddressUSRecord",
                        "fields" : [
                            {"name": "streetaddress", "type": "string"},
                            {"name": "city", "type": "string"}
                        ]
                    },
        }
    ]
}

I don't like giving the field the name address and having to give a different name (AddressUSRecord) to the field's schema. Can I give the field and schema the same name, address?

What if I want to use the AddressUSRecord schema in multiple other schemas, not just person? If I want to use AddressUSRecord in another schema, let's say business, do I have to name it something else?

Ideally, I'd like to define AddressUSRecord in a separate schema, then let the type of address reference AddressUSRecord. However, it's not clear that Avro 1.8.1 supports this out-of-the-box. This 2014 article shows that sub-schemas need to be handled with custom code. What the best way to define reusable schemas in Avro 1.8.1?

Note: I'd like a solution that works with Confluent Inc.'s Schema Registry. There's a Google Groups thread that seems to suggest that Schema Registry does not play nice with schema references.

Rawlinson answered 28/11, 2016 at 22:19 Comment(0)
S
22

Can I give the field and schema the same name, address?

Yes, you can name the record with the same name as the field name.

What if I want to use the AddressUSRecord schema in multiple other schemas, not just person?

You can use multiple schemas using a couple of techniques: the avro schema parser clients (JVM and others) allow you to specify multiple schemas, usually through the names parameter (the Java Schema$Parser/parse method allows multiple schema String arguments).

You can then specify dependant Schemas as a named type:

{
  "type": "record",
  "name": "Address",
  "fields": [
    {
      "name": "streetaddress",
      "type": "string"
    },
    {
      "name": "city",
      "type": "string"
    }
  ]
}

And run this through the parser before the parent schema:

{
  "name": "person",
  "type": "record",
  "fields": [
    {
      "name": "firstname",
      "type": "string"
    },
    {
      "name": "lastname",
      "type": "string"
    },
    {
      "name": "address",
      "type": "Address"
    }
  ]
}

Incidentally, this allows you to parse from separate files.

Alternatively, you can also parse a single Union schema that references schemas in the same way:

[
  {
    "type": "record",
    "name": "Address",
    "fields": [
      {
        "name": "streetaddress",
        "type": "string"
      },
      {
        "name": "city",
        "type": "string"
      }
    ]
  },
  {
    "type": "record",
    "name": "person",
    "fields": [
      {
        "name": "firstname",
        "type": "string"
      },
      {
        "name": "lastname",
        "type": "string"
      },
      {
        "name": "address",
        "type": "Address"
      }
    ]
  }
]

I'd like a solution that works with Confluent Inc.'s Schema Registry.

The schema registry does not support parsing schemas separately, but it does support the latter example of parsing into a union type.

Sandblind answered 29/11, 2016 at 11:58 Comment(2)
Hi Niel, I am trying exactly the same schema on confluent cloud, but when I am calling this schema in producer I cannot use this schema, i.e., `<ccloud kafka topic produce sometopic --delimiter ":" --value-format "avro" --schema "./above_schema.avsc" --sr-endpoint "xxxx.eu-central-1.aws.confluent.cloud" --api-key "xx" --api-secret "xxxxx" Starting Kafka Producer. ^C or ^D to exit {"firstname": "Joe", "lastname": "Doe", "address": {"streetaddress": "somestreet", "city": "somecity"}} Error: cannot decode textual union: cannot decode textual map: cannot determine codec: "firstname"Monafo
Your schema which is a JSON array did pass the avro validation but how can it be used? The producer's calling syntax that works on JSON object schema does not work on your proposal. ThanksMonafo
D
2

You can set namespace to the record type and then, in subsequent fields, use {namespace}.{name} as the type argument. Unfortunately currently there is no possibility to reference types from other schema files.

Devoe answered 29/11, 2016 at 7:49 Comment(2)
I tried nesting an avro as a type but it doesn't find the type defined in another schema file even in 2023 with avro maven plugin 1.11.1. Is this a unsolved mystery ? Appreciate if someone solved this problem to comment.Hookah
@Hookah from what I know it's still unsolved and being honest I wouldn't expect it to be solved in the future as well. Maybe if you use avdl files, required types will be nested in the avsc. From what I understand, avsc files have to be self-contained as they are included in the header of .avro files.Devoe
P
0

Just like @Niel Drummond rightly said. You can nest avro schemas defined in separate files...

Using the same example he used:

Assuming we have these two schemas defined in separate avro files.

let the file path be ${project.basedir}/src/main/avro/schemas/Address.avsc
{
  "namespace": "com.nesting",
  "type": "record",
  "name": "Address",
  "fields": [
    {
      "name": "streetaddress",
      "type": "string"
    },
    {
      "name": "city",
      "type": "string"
    }
  ]
}

This schema references the Address schema in the "address" field.

let the file path be ${project.basedir}/src/main/avro/schemas/Person.avsc
{
  "namespace": "com.nesting",
  "name": "Person",
  "type": "record",
  "fields": [
    {
      "name": "firstname",
      "type": "string"
    },
    {
      "name": "lastname",
      "type": "string"
    },
    {
      "name": "address",
      "type": "Address"
    }
  ]
}

If you get this error during compilation Undefined name: "Address" Then check that you have your avro plugin configured to import the Address schema as shown below.

<plugin>
        <groupId>org.apache.avro</groupId>
        <artifactId>avro-maven-plugin</artifactId>
        <version>1.11.1</version>
        <executions>
          <execution>
            <phase>generate-sources</phase>
            <goals>
              <goal>schema</goal>
            </goals>
            <configuration>
              <sourceDirectory>${project.basedir}/src/main/avro/schemas/</sourceDirectory>
              <outputDirectory>${project.build.directory}/generated-sources/avro</outputDirectory>
              <imports>
                <import>${project.basedir}/src/main/avro/schemas/Address.avsc</import>
                <import>${project.basedir}/src/main/avro/schemas/Person.avsc</import>
               
              </imports>

              <includes>
                <include>*.avsc</include>
              </includes>
            </configuration>
          </execution>
        </executions>
      </plugin>

And your code should work with the above configuration.

NOTE: The Avro plugin version used here is 1.11.1

Provided answered 8/12, 2023 at 13:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.