Is it a way to compile Avro schema(s) to python classes?
Asked Answered
T

4

12

I am curious of is it a way to deal with Avro Python in the same way as in Java or C++ implementations.

According to the official Avro Python documentation, I have to provide an Avro schema in runtime to encode/decode data. But is it a way to use code generator as it did in Java/C++?

Transcript answered 6/10, 2015 at 16:44 Comment(0)
E
3

Update: My coworker put together a pretty good library for doing this, avro-to-python. We have been using it in production for over a year now on some pretty complex schemas.

I had to implement something like this for php: avro-to-php

Ethridge answered 30/7, 2019 at 20:57 Comment(0)
C
2

pyschema is a pretty good start, but the documentation is poor. You'll need to look a the source code to see how it all works. You can use it to read avro schemas and generate python source code. It adds another layer of abstraction and as such slows things down a bit more.

Cenobite answered 16/10, 2015 at 22:15 Comment(0)
C
2

I've asked this question a couple of times recently in the Pulsar slack channel and my belief is that no tool currently exists that can convert an Avro schema to a Python class that is compatible with the Pulsar Python client library.

The Pulsar Python client library expects the Python class to inherit from the Record class (https://github.com/apache/pulsar/blob/master/pulsar-client-cpp/python/pulsar/schema/definition.py#L57), and for every field in the Python class to inherit from the Field class (https://github.com/apache/pulsar/blob/master/pulsar-client-cpp/python/pulsar/schema/definition.py#L141), both of which are defined in the Pulsar Python client library.

So, an Avro to Python converter would have to import the Record class and Field class from the Python client library, and so if such a converter exists, someone in the Pulsar Slack community really should know about it.

Further, the Pulsar Python client library is missing support for Avro keywords like "doc", "namespace", and for null default values. So even if an Avro to Python converter exists for Pulsar, likely, the converted Python class cannot be properly consumed by the Pulsar Python client library.

Cassis answered 29/4, 2021 at 23:46 Comment(0)
P
1

I don't see any indication of an existing Avro schema -> Python class code generator in the docs (which explicitly mention code generation for the Java case) for arbitrary Python interpreters. If you're using Jython, you could use the Java code generator to make a class that you access in your Jython code.

Unlike Java and C++, failing to have code generation doesn't affect Python performance much (in the CPython case anyway), since class instances are implemented in terms of dicts anyway (there are exceptions to this rule in a sense, but they mostly change memory usage, not the fact that dict lookup is always involved). That makes code generation largely "nice to have" syntactic sugar, not a necessary feature for development; with some effort, you could always implement a converter than writes out a class definition and evals it in Python to get a similar effect (this is how collections.namedtuple classes are defined).

Puca answered 6/10, 2015 at 16:57 Comment(2)
Thanks for the help. But it is really sad. We are going to implement Python SDK for Kaa IoT platform. By design, Kaa SDK is configured by several Avro schema(s) by generating corresponding Java/C++ class-es. For C SDK we have implemented our own Avro generator to use plain C struct instead of Avro datum. By using such approach Avro schema(s) are needed only on the stage of generating SDK. After this a developer works only with pre-compiled classes without knowing about any schema structure.Transcript
I don't see why this approach couldn't work in Python. The current python implementation is terribly inefficient and code generation could speed things up considerably.Cenobite

© 2022 - 2024 — McMap. All rights reserved.