How to extract schema from an avro file in Java
Asked Answered
E

3

28

How do you extract first the schema and then the data from an avro file in Java? Identical to this question except in java.

I've seen examples of how to get the schema from an avsc file but not an avro file. What direction should I be looking in?

Schema schema = new Schema.Parser().parse(
    new File("/home/Hadoop/Avro/schema/emp.avsc")
);
Englishism answered 4/8, 2017 at 1:9 Comment(0)
K
44

If you want know the schema of a Avro file without having to generate the corresponding classes or care about which class the file belongs to, you can use the GenericDatumReader:

DatumReader<GenericRecord> datumReader = new GenericDatumReader<>();
DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(new File("file.avro"), datumReader);
Schema schema = dataFileReader.getSchema();
System.out.println(schema);

And then you can read the data inside the file:

GenericRecord record = null;
while (dataFileReader.hasNext()) {
    record = dataFileReader.next(record);
    System.out.println(record);
}
Kisser answered 12/8, 2017 at 8:22 Comment(2)
For those using the C# Avro Apache library, the utility function DataFileReader<GenericRecord>.OpenReader(filename); can be used to instantiate the dataFileReader. Once instantiated, it the dataFileReader is used just like in Java.Demodulator
I am trying to read the schema and data from a byte array instead of a File (containing both schema and payload). How can I do it?Radom
S
2

Thanks for @Helder Pereira's answer. As a complement, the schema can also be fetched from getSchema() of GenericRecord instance.
Here is an live demo about it, the link above shows how to get data and schema in java for Parquet, ORC and AVRO data format.

Shana answered 12/2, 2020 at 16:21 Comment(0)
C
1

You can use the data bricks library as shown here https://github.com/databricks/spark-avro which will load the avro file into a Dataframe (Dataset<Row>)

Once you have a Dataset<Row>, you can directly get the schema using df.schema()

Charlenecharleroi answered 9/8, 2017 at 21:36 Comment(2)
Apologies, I just realized you weren't actually using Spark to begin with. If you're not already using Spark, then my solution is more trouble than its worth. I'll leave the answer though, in case someone coming from the Spark perspective has the same questionCharlenecharleroi
I am not using Spark. Just the plain vanilla avro tools jar but thank you.Englishism

© 2022 - 2024 — McMap. All rights reserved.