How to define avro schema for complex json document?
Asked Answered
S

2

16

I have a JSON document that I would like to convert to Avro and need a schema to be specified for that purpose. Here is the JSON document for which I would like to define the avro schema:

{
 "uid": 29153333,
 "somefield": "somevalue",
 "options": [
   {
     "item1_lvl2": "a",
     "item2_lvl2": [
       {
         "item1_lvl3": "x1",
         "item2_lvl3": "y1"
       },
       {
         "item1_lvl3": "x2",
         "item2_lvl3": "y2"
       }
     ]
   }
 ]
}

I'm able to define the schema for the non-complex types but not for the complex "options" field:

{
  "namespace" : "my.com.ns",
  "type" :  "record",
  "fields" : [
     {"name": "uid", "type": "int"},
     {"name": "somefield", "type": "string"}
     {"name": "options", "type": .....}
  ]
}

Thanks for the help!

Scan answered 27/1, 2015 at 4:24 Comment(0)
B
27

You need to use Avro complex types, specifically arrays and records. And then nest these together:

{
  "namespace" : "my.com.ns",
  "name": "myrecord",
  "type" :  "record",
  "fields" : [
     {"name": "uid", "type": "int"},
     {"name": "somefield", "type": "string"},
     {"name": "options", "type": {
        "type": "array",
        "items": {
            "type": "record",
            "name": "lvl2_record",
            "fields": [
                {"name": "item1_lvl2", "type": "string"},
                {"name": "item2_lvl2", "type": {
                    "type": "array",
                    "items": {
                        "type": "record",
                        "name": "lvl3_record",
                        "fields": [
                            {"name": "item1_lvl3", "type": "string"},
                            {"name": "item2_lvl3", "type": "string"}
                        ]
                    }
                }}
            ]
        }
     }}
  ]
}

Also, to improve readiblity, you can split the schema into multiple files.

Bautista answered 27/1, 2015 at 21:44 Comment(3)
> in correct order On one level of nesting, Avro doesn't care about field ordering. The fields are accessed by name during deserialization, based on the schema the reader knows.Nympho
By "in correct order" I meant in corresponding hierarchical order. I removed that misleading phrase.Bautista
Record and enum names: upper CamelCase please.Rolo
E
12

This online tool (http://avro4s-ui.landoop.com/) is very practical, you can generate the AVRO schema by a given valid json.

Enidenigma answered 13/3, 2018 at 17:22 Comment(1)
This is fantastic. I had a rather complex JSON format I needed an avro schema for to convert into parquet, and this tool did the trick.Prebo

© 2022 - 2024 — McMap. All rights reserved.