How to nest records in an Avro schema?
Asked Answered
E

2

45

I'm trying to get Python to parse Avro schemas such as the following...

from avro import schema

mySchema = """
{
    "name": "person",
    "type": "record",
    "fields": [
        {"name": "firstname", "type": "string"},
        {"name": "lastname", "type": "string"},
        {
            "name": "address",
            "type": "record",
            "fields": [
                {"name": "streetaddress", "type": "string"},
                {"name": "city", "type": "string"}
            ]
        }
    ]
}"""

parsedSchema = schema.parse(mySchema)

...and I get the following exception:

avro.schema.SchemaParseException: Type property "record" not a valid Avro schema: Could not make an Avro Schema object from record.

What am I doing wrong?

Emileemilee answered 1/8, 2012 at 17:16 Comment(0)
H
65

According to other sources on the web I would rewrite your second address definition:

mySchema = """
{
    "name": "person",
    "type": "record",
    "fields": [
        {"name": "firstname", "type": "string"},
        {"name": "lastname", "type": "string"},
        {
            "name": "address",
            "type": {
                        "type" : "record",
                        "name" : "AddressUSRecord",
                        "fields" : [
                            {"name": "streetaddress", "type": "string"},
                            {"name": "city", "type": "string"}
                        ]
                    }
        }
    ]
}"""
Hypesthesia answered 1/8, 2012 at 18:10 Comment(5)
Thanks, Marco, that worked. The second declaration of the address name (the one where you wrote "AddressUSRecord") seems to be necessary to parse the schema, but ignored when working with data that adheres to the schema.Emileemilee
This makes little sense. Why can person have a type of record, but address cannot?Twibill
Where in the avro spec does it allow a type to be expanded like this?Substantial
Check out the Parsing Canonical Form part of the spec.: avro.apache.org/docs/current/… As far as I understand it, ALL types are expended, even primitives and the single word we usually see is the Parsed Canonical Form of the schema. so when we write: {"type": "string"} its the same as writing, {"type": {"type": "string"}}Gaitan
This answer would have saved my 1-day worth of debugging if I found it earlier.Guidebook
S
7

Every time we provide the type as named type, the field needs to be given as:

"name":"some_name",
"type": {
          "name":"CodeClassName",
           "type":"record/enum/array"
 } 

However, if the named type is union, then we do not need an extra type field and should be usable as:

"name":"some_name",
"type": [{
          "name":"CodeClassName1",
           "type":"record",
           "fields": ...
          },
          {
           "name":"CodeClassName2",
            "type":"record",
            "fields": ...
}]

Hope this clarifies further!

Stellate answered 30/7, 2016 at 14:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.