Encode an object with Avro to a byte array in Python
Asked Answered
R

2

17

In python 2.7, using Avro, I'd like to encode an object to a byte array.

All examples I've found write to a file.

I've tried using io.BytesIO() but this gives:

AttributeError: '_io.BytesIO' object has no attribute 'write_long'

Sample using io.BytesIO

def avro_encode(raw, schema):
    writer = DatumWriter(schema)
    avro_buffer = io.BytesIO()
    writer.write(raw, avro_buffer)
    return avro_buffer.getvalue()
Recursion answered 12/5, 2014 at 16:48 Comment(0)
G
47

Your question helped me figure things out, so thanks. Here's a simple python example based on the python example in the docs:

import io
import avro.schema
import avro.io

test_schema = '''
{
"namespace": "example.avro",
 "type": "record",
 "name": "User",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
 ]
}
'''

schema = avro.schema.parse(test_schema)
writer = avro.io.DatumWriter(schema)

bytes_writer = io.BytesIO()
encoder = avro.io.BinaryEncoder(bytes_writer)
writer.write({"name": "Alyssa", "favorite_number": 256}, encoder)
writer.write({"name": "Ben", "favorite_number": 7, "favorite_color": "red"}, encoder)

raw_bytes = bytes_writer.getvalue()
print(len(raw_bytes))
print(type(raw_bytes))

bytes_reader = io.BytesIO(raw_bytes)
decoder = avro.io.BinaryDecoder(bytes_reader)
reader = avro.io.DatumReader(schema)
user1 = reader.read(decoder)
user2 = reader.read(decoder)

print(user1)
print(user2)
Galvin answered 5/8, 2014 at 3:27 Comment(5)
If you want to run this under Python 3 change "schema = avro.schema.parse(test_schema)" to "schema = avro.schema.Parse(test_schema)"Ringtailed
quick question, when I try to write these stream of bytes into file and then save it on hdfs, the hdfs dfs -text command is not able to convert it back to string, apparently I am missing any step before writing the stream into file.Herald
Is there any way to write bytes_writer as avro file to s3 bucket?Waves
client.upload_fileobj(Bucket=aws.s3_bucket_name, Key=f'{s3_key}/{file_name}', Fileobj=bytes_writer) This way it creating file but content is empty.Waves
and follow authentise.com/post/getting-started-with-avro-and-python-3Compressibility
W
-2

Using import avro library we can't write the avro file with the schema.

To overcome this problem use fastavro e.g.

import io
import fastavro
data = [{"name": "Shravan", "favorite_number": 256}, {"name": "Ram", "favorite_number": 7, "favorite_color": "red"}]
bytes_writer = io.BytesIO()
fastavro.writer(bytes_writer, get_avro_schema(), data)
print(bytes_writer.get_value())
Waves answered 1/10, 2019 at 10:58 Comment(4)
why can't we write the avro file with the schema?Dowel
I tried using import avro but, I did not able create avro file. So I used fastavro libraryWaves
But take a look at another answer in this question (https://mcmap.net/q/689694/-encode-an-object-with-avro-to-a-byte-array-in-python). There is a schema used. In particular, the following lines: schema = avro.schema.parse(test_schema) and writer = avro.io.DatumWriter(schema)Dowel
I mean ur answer might take place as fastavro alternative but the part with schema doesn't look like trueDowel

© 2022 - 2024 — McMap. All rights reserved.