How to extract schema for avro file in python
Asked Answered
G

3

21

I am trying to use the Python Avro library (https://pypi.python.org/pypi/avro) to read a AVRO file generated by JAVA. Since the schema is already embedded in the avro file, why do I need to specify a schema file? Is there a way to extract it automatically?

Found another package called fastavro(https://pypi.python.org/pypi/fastavro) can extract avro schema. Is the manual specifying schema file in python arvo package by design? Thank you very much.

Giannini answered 29/7, 2014 at 0:6 Comment(0)
N
14

I use python 3.4 and Avro package 1.7.7

For schema file use:

reader = avro.datafile.DataFileReader(open('file_name.avro',"rb"),avro.io.DatumReader())
schema = reader.meta
print(schema) 
Notions answered 15/7, 2015 at 9:45 Comment(1)
This worked well with Python 2.7 as well. My import statements are as follows (not sure how much you need): import avro.schema from avro.datafile import DataFileReader from avro.io import DatumReaderDecanal
M
10

A direct examination of /usr/local/lib/python2.7/site-packages/avro/datafile.py reveals the answer:

reader = avro.datafile.DataFileReader(input,avro.io.DatumReader())
schema = reader.datum_reader.writers_schema
print schema

Curiously, in Java there is a special method for that: reader.getSchema().

Manymanya answered 5/11, 2014 at 17:50 Comment(0)
P
2

In my case in order to get the schema as a "consumable" python dictionary containing useful info such schema name and so on I did the following:

reader: DataFileReader = DataFileReader(open(avro_file, 'rb'), DatumReader())
schema: dict = json.loads(reader.meta.get('avro.schema').decode('utf-8'))

The reader.meta is a dictionary pretty useless "as is", since it contains 2 keys: avro.codec and avro.schema that are both bytes objects (so I had to parse it in order to access to properties).

Physostomous answered 12/7, 2017 at 10:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.