how to read avro files in python 3.5.2
Asked Answered
I

1

8

I am trying to read avro files using python.

I installed Apache Avro successfully (I think I did because I am able to "import avro" in the python shell) following the instruction here

https://avro.apache.org/docs/1.8.1/gettingstartedpython.html

However, when I try to read avro files following the code in the above instruction. I keep receiving errors when importing avro related stuff.

>>> import avro.schema
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
import avro.schema
File "<frozen importlib._bootstrap>", line 969, in _find_and_load
File "<frozen importlib._bootstrap>", line 954, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 896, in _find_spec
File "<frozen importlib._bootstrap_external>", line 1139, in find_spec
File "<frozen importlib._bootstrap_external>", line 1115, in _get_spec
File "<frozen importlib._bootstrap_external>", line 1096, in _legacy_get_spec
File "<frozen importlib._bootstrap>", line 444, in spec_from_loader
File "<frozen importlib._bootstrap_external>", line 533, in spec_from_file_location
File "I:\Program Files\lib\site-packages\avro-_avro_version_-py3.5.egg\avro\schema.py", line 340
except Exception, e:
                ^
SyntaxError: invalid syntax


>>> from avro.datafile import DataFileReader, DataFileWriter
Traceback (most recent call last):
File "I:\Program Files\lib\site-packages\avro-_avro_version_-py3.5.egg\avro\datafile.py", line 21, in <module>
from cStringIO import StringIO
ImportError: No module named 'cStringIO'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
from avro.datafile import DataFileReader, DataFileWriter
File "I:\Program Files\lib\site-packages\avro-_avro_version_-py3.5.egg\avro\datafile.py", line 23, in <module>
from StringIO import StringIO
ImportError: No module named 'StringIO'


>>> from avro.io import DatumReader, DatumWriter
Traceback (most recent call last):
File "<pyshell#19>", line 1, in <module>
from avro.io import DatumReader, DatumWriter
File "<frozen importlib._bootstrap>", line 969, in _find_and_load
File "<frozen importlib._bootstrap>", line 954, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 896, in _find_spec
File "<frozen importlib._bootstrap_external>", line 1139, in find_spec
File "<frozen importlib._bootstrap_external>", line 1115, in _get_spec
File "<frozen importlib._bootstrap_external>", line 1096, in _legacy_get_spec
File "<frozen importlib._bootstrap>", line 444, in spec_from_loader
File "<frozen importlib._bootstrap_external>", line 533, in spec_from_file_location
File "I:\Program Files\lib\site-packages\avro-_avro_version_-py3.5.egg\avro\io.py", line 200
bits = (((ord(self.read(1)) & 0xffL)) |
                                  ^
SyntaxError: invalid syntax

So did I install avro successfully? Why am I receiving those errors? I am using python 3.5.2 on windows 7.

Edited I fixed the issue following the suggestion by Stephane Martin. Then I try to read avro files into python. I have a bunch of avros in a directory which has already been set as the right path in the python. Here is my code

import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter

reader = DataFileReader(open("part-00000-of-01733.avro", "r"), DatumReader())
for user in reader:
   print (user)
reader.close()

And it returns the error

Traceback (most recent call last):
File "I:\DJ data\read avro.py", line 5, in <module>
reader = DataFileReader(open("part-00000-of-01733.avro", "r"), DatumReader())
File "I:\Program Files\lib\site-packages\avro_python3-1.8.1-py3.5.egg\avro\datafile.py", line 349, in __init__
self._read_header()
File "I:\Program Files\lib\site-packages\avro_python3-1.8.1-py3.5.egg\avro\datafile.py", line 459, in _read_header
META_SCHEMA, META_SCHEMA, self.raw_decoder)
File "I:\Program Files\lib\site-packages\avro_python3-1.8.1-py3.5.egg\avro\io.py", line 525, in read_data
return self.read_record(writer_schema, reader_schema, decoder)
File "I:\Program Files\lib\site-packages\avro_python3-1.8.1-py3.5.egg\avro\io.py", line 725, in read_record
field_val = self.read_data(field.type, readers_field.type, decoder)
File "I:\Program Files\lib\site-packages\avro_python3-1.8.1-py3.5.egg\avro\io.py", line 515, in read_data
return self.read_fixed(writer_schema, reader_schema, decoder)
File "I:\Program Files\lib\site-packages\avro_python3-1.8.1-py3.5.egg\avro\io.py", line 568, in read_fixed
return decoder.read(writer_schema.size)
File "I:\Program Files\lib\site-packages\avro_python3-1.8.1-py3.5.egg\avro\io.py", line 170, in read
input_bytes = self.reader.read(n)
File "I:\Program Files\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 863: character maps to

I am indeed aware that in the example in the instruction, a schema is created first. But what is a avsc file? How shall I create it and the corresponding schema in my case?

Iodize answered 22/11, 2016 at 1:35 Comment(1)
Also, open("part-00000-of-01733.avro", "rb") and not "r" for binary format.Livialivid
R
35

With recent versions of the avro package, this should no longer be an issue.


Original answer:

When installing through pip or a similar package manager: install the avro-python3 package instead of just avro.

Ravelment answered 17/11, 2017 at 12:52 Comment(1)
avro-python3 is deprecated, go back to using just avro pypi.org/project/avro-python3Sunset

© 2022 - 2024 — McMap. All rights reserved.