Login/Register
Avro schema definition nesting types
Asked Answered
F

3

16

I am fairly new to Avro and going through documentation for nested types. I have the example below working nicely but many different types within the model will have addresses. Is it possible to define an address.avsc file and reference that as a nested type? If that is possible, can you also take it a step further and have a list of Addresses for a Customer? Thanks in advance.

{"namespace": "com.company.model",
  "type": "record",
  "name": "Customer",
  "fields": [
    {"name": "firstname", "type": "string"},
    {"name": "lastname", "type": "string"},
    {"name": "email", "type": "string"},
    {"name": "phone", "type": "string"},
    {"name": "address", "type":
      {"type": "record",
       "name": "AddressRecord",
       "fields": [
         {"name": "streetaddress", "type": "string"},
         {"name": "city", "type": "string"},
         {"name": "state", "type": "string"},
         {"name": "zip", "type": "string"}
       ]}
    }
  ]
}
Frankhouse answered 26/3, 2015 at 14:8 Comment(0)
S
29

There are 4 possible ways:

  1. Including it in pom file as mentioned in this ticket.
  2. Declare all your types in a single avsc file.
  3. Using a single static parser that first parses all the imports and then parse the actual data types.
  4. (This is a hack) Use avdl file and use imports like https://avro.apache.org/docs/1.7.7/idl.html#imports . Though, IDL is intended for RPC calls.

Example for 2. Declare all your types in a single avsc file. Also answers array declaration on address.

[
{
    "type": "record",
    "namespace": "com.company.model",
    "name": "AddressRecord",
    "fields": [
        {
            "name": "streetaddress",
            "type": "string"
        },
        {
            "name": "city",
            "type": "string"
        },
        {
            "name": "state",
            "type": "string"
        },
        {
            "name": "zip",
            "type": "string"
        }
    ]
},
{
    "namespace": "com.company.model",
    "type": "record",
    "name": "Customer",
    "fields": [
        {
            "name": "firstname",
            "type": "string"
        },
        {
            "name": "lastname",
            "type": "string"
        },
        {
            "name": "email",
            "type": "string"
        },
        {
            "name": "phone",
            "type": "string"
        },
        {
            "name": "address",
            "type": {
                "type": "array",
                "items": "com.company.model.AddressRecord"
            }
        }
    ]
},
{
    "namespace": "com.company.model",
    "type": "record",
    "name": "Customer2",
    "fields": [
        {
            "name": "x",
            "type": "string"
        },
        {
            "name": "y",
            "type": "string"
        },
        {
            "name": "address",
            "type": {
                "type": "array",
                "items": "com.company.model.AddressRecord"
            }
        }
    ]
}
]

Example for 3. Using a single static parser

Parser parser = new Parser(); // Make this static and reuse
parser.parse(<location of address.avsc file>);
parser.parse(<location of customer.avsc file>);
parser.parse(<location of customer2.avsc file>);

If we want a hold of the Schema, that is if we want to create new records, we can either do https://avro.apache.org/docs/1.5.4/api/java/org/apache/avro/Schema.Parser.html#getTypes() method to get the schema or

Parser parser = new Parser(); // Make this static and reuse
Schema addressSchema =parser.parse(<location of address.avsc file>);
Schema customerSchema=parser.parse(<location of customer.avsc file>);
Schema customer2Schema =parser.parse(<location of customer2.avsc file>); 
Selfstarter answered 7/4, 2015 at 12:17 Comment(4)
Not clear on how to use the parser in example #3. Once it is created, how does one go about creating a record (a blank record, not deserialize)Extracanonical
@Extracanonical I have edited my answer to clarify your doubt. Hope it is clear now.Selfstarter
In #2, your root type is a UNION, right? So that would allow users to serialize any of those top-level types as the root object? That's a little unfortunate because if you only want to serialize Customer objects at the top-level, you can't really get it to work that way.Crus
I'm trying Option 2 - however when I use it I get an error - can you confirm it should like {"name": "address", "type": "com.company.model.AddressRecord"}Precedency
G
3

Just to added to @Princey James answer, the nested type must be defined before it is used.

Gland answered 12/12, 2018 at 20:42 Comment(2)
You should leave that message as a comment under Princey James answer, since it is not a complete answer to the question.Urita
I dont have the 50 reputation needed to add comment to that answer.Gland
L
2

Other add to @Princey James

With the Example for 2. Declare all your types in a single avsc file.

It will work for Serializing and deserializing with code generation

but Serializing and deserializing without code generation is not working

you will get org.apache.avro.AvroRuntimeException: Not a record schema: [{"type":" ...

working example with code generation :

  @Test
  public void avroWithCode() throws IOException {

    UserPerso UserPerso3 = UserPerso.newBuilder()
                                    .setName("Charlie")
                                    .setFavoriteColor("blue")
                                    .setFavoriteNumber(null)
                                    .build();

    AddressRecord adress = AddressRecord.newBuilder()
                                        .setStreetaddress("mo")
                                        .setCity("Paris")
                                        .setState("IDF")
                                        .setZip("75")
                                        .build();

    ArrayList<AddressRecord> li = new ArrayList<>();
    li.add(adress);

    Customer cust = Customer.newBuilder()
                            .setUser(UserPerso3)
                            .setPhone("0101010101")
                            .setAddress(li)
                            .build();

    String fileName = "cust.avro";

    File a = new File(fileName);

    DatumWriter<Customer> customerDatumWriter = new SpecificDatumWriter<>(Customer.class);
    DataFileWriter<Customer> dataFileWriter = new DataFileWriter<>(customerDatumWriter);
    dataFileWriter.create(cust.getSchema(), new File(fileName));
    dataFileWriter.append(cust);
    dataFileWriter.close();

    DatumReader<Customer> custDatumReader = new SpecificDatumReader<>(Customer.class);
    DataFileReader<Customer> dataFileReader = new DataFileReader<>(a, custDatumReader);
    Customer cust2 = null;
    while (dataFileReader.hasNext()) {
      cust2 = dataFileReader.next(cust2);
      System.out.println(cust2);
    }
  }

without :

  @Test
  public void avroWithoutCode() throws IOException {

    Schema schemaUserPerso = new Schema.Parser().parse(new File("src/main/resources/avroTest/user.avsc"));
    Schema schemaAdress = new Schema.Parser().parse(new File("src/main/resources/avroTest/user.avsc"));
    Schema schemaCustomer = new Schema.Parser().parse(new File("src/main/resources/avroTest/user.avsc"));

    System.out.println(schemaUserPerso);

    GenericRecord UserPerso3 = new GenericData.Record(schemaUserPerso);
    UserPerso3.put("name", "Charlie");
    UserPerso3.put("favorite_color", "blue");
    UserPerso3.put("favorite_number", null);

    GenericRecord adress = new GenericData.Record(schemaAdress);

    adress.put("streetaddress", "mo");
    adress.put("city", "Paris");
    adress.put("state", "IDF");
    adress.put("zip", "75");

    ArrayList<GenericRecord> li = new ArrayList<>();
    li.add(adress);

    GenericRecord cust = new GenericData.Record(schemaCustomer);

    cust.put("user", UserPerso3);
    cust.put("phone", "0101010101");
    cust.put("address", li);

    String fileName = "cust.avro";

    File file = new File(fileName);

    DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(schemaCustomer);
    DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<>(datumWriter);
    dataFileWriter.create(schemaCustomer, file);
    dataFileWriter.append(cust);
    dataFileWriter.close();

    File a = new File(fileName);

    DatumReader<GenericRecord> datumReader = new GenericDatumReader<>(schemaCustomer);
    DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(a, datumReader);
    GenericRecord cust2 = null;
    while (dataFileReader.hasNext()) {
      cust2 = dataFileReader.next(cust2);
      System.out.println(cust2);

    }
  }
Leporide answered 1/3, 2019 at 13:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.