I have a large .xz file (few gigabytes). It's full of plain text. I want to process the text to create custom dataset. I want to read it line by line because it is too big. Anyone have an idea how to do it ?
I already tried this How to open and read LZMA file in-memory but it's not working.
EDIT: i got this error 'ascii' codec can't decode byte 0xfd in position 0: ordinal not in range(128)
on the line for line in uncompressed:
from the link
EDIT2: My code (using python 3.5)
with open(filename) as compressed:
with lzma.LZMAFile(compressed) as uncompressed:
for line in uncompressed:
print(line)