Generate Avro Schema from certain Java Object
Asked Answered
K

4

35

Apache Avro provides a compact, fast, binary data format, rich data structure for serialization. However, it requires user to define a schema (in JSON) for object which need to be serialized.

In some case, this can not be possible (e.g: the class of that Java object has some members whose types are external java classes in external libraries). Hence, I wonder there is a tool can get the information from object's .class file and generate the Avro schema for that object (like Gson use object's .class information to convert certain object to JSON string).

Keown answered 9/4, 2014 at 6:18 Comment(3)
Interesting question. There exists a tool which can generate JSON Schemas from Java classes (jsonschema2pojo) and I have a tool which can generate Avro schema from JSON Schemas (json-schema-avro). However, the former tool can only generate JSON Schema v3, and my tool awais JSON Schema v4 as inputs...Chuck
Thank you for your answer, do you mean that you wrote a tool which can convert Avro Schema from JSON schema?Keown
Yes, I mean that: github.com/fge/json-schema-avroChuck
T
43

Take a look at the Java reflection API.

Getting a schema looks like:

Schema schema = ReflectData.get().getSchema(T);

See the example from Doug on another question for a working example.

Credits of this answer belong to Sean Busby.

Tammeratammi answered 1/7, 2014 at 7:38 Comment(3)
This works well with non-nullable columns, but I ve some fields that are nullable. Is there a way to make those fields nullable in Aro schema. Otherwise it throws an exception = org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.NullPointerException: in models.RawData in double null of double in field offset of models.RawDataGerger
For nullable fields, use the AllowNull sub-class: Schema schema = ReflectData.AllowNull.get().getSchema(T);Sabbat
Support for java.time, UUID etc is not straightforward, better go with jackson for that.Abdias
C
14

Here's how to Generate an Avro Schema from POJO definition

ObjectMapper mapper = new ObjectMapper(new AvroFactory());
AvroSchemaGenerator gen = new AvroSchemaGenerator();
mapper.acceptJsonFormatVisitor(RootType.class, gen);
AvroSchema schemaWrapper = gen.getGeneratedSchema();
org.apache.avro.Schema avroSchema = schemaWrapper.getAvroSchema();
String asJson = avroSchema.toString(true);
Costa answered 10/1, 2018 at 14:13 Comment(1)
From jackson 2.5, dedicated AvroMapper makes it much simpler and cleaner. See separate answer.Abdias
C
5

** Example**

Pojo class

public class ExportData implements Serializable {
    private String body;
    // ... getters and setters
}

Serialize

File file = new File(fileName);
DatumWriter<ExportData> writer = new ReflectDatumWriter<>(ExportData.class);
DataFileWriter<ExportData> dataFileWriter = new DataFileWriter<>(writer);
Schema schema = ReflectData.get().getSchema(ExportData.class);
dataFileWriter.create(schema, file);
for (Row row : resultSet) {
    String rec = row.getString(0);
    dataFileWriter.append(new ExportData(rec));
}
dataFileWriter.close();

Deserialize

File file = new File(avroFilePath);
DatumReader<ExportData> datumReader = new ReflectDatumReader<>(ExportData.class);
DataFileReader<ExportData> dataFileReader = new DataFileReader<>(file, datumReader);
ExportData record = null;
while (dataFileReader.hasNext()){
    record = dataFileReader.next(record);
    // process record
}
Cummings answered 23/4, 2019 at 10:16 Comment(1)
This works well with non-nullable columns, but I ve some fields that are nullable. Is there a way to make those fields nullable in Aro schema. Otherwise it throws an exception = org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.NullPointerException: in models.RawData in double null of double in field offset of models.RawDataGerger
A
0

AvroMapper() from jackson-dataformat-avro 2.5+ makes it really easy:

AvroMapper().schemaFor(MyClass.class.java).avroSchema

It is a subclass of jackson ObjectMapper, so all the regular jackson configs and extensions apply, e.g:

val mapper = AvroMapper().apply {registerModule(JavaTimeModule())}
Abdias answered 25/3, 2024 at 11:0 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.