How to open and read LZMA file in-memory
Asked Answered
N

2

8

I have a giant file, let's call it one-csv-file.xz. It is an XZ-compressed CSV file.

How can I open and parse through the file without first decompressing it to disk? What if the file is, for example, 100 GB? Python cannot read all of that into memory at once, of course. Will it page or run out of memory?

Nellie answered 22/2, 2015 at 2:38 Comment(0)
V
7

You can iterate through an LZMAFile object

import lzma  # python 3, try lzmaffi in python 2
with open('one-csv-file.xz') as compressed:
    with lzma.LZMAFile(compressed) as uncompressed:
        for line in uncompressed:
            do_stuff_with(line)
Vulnerary answered 25/4, 2016 at 15:55 Comment(2)
Cf. here to cope with a text encoding other than ASCII.Sharlasharleen
Yea what actually worked for me too was the link provided by @SharlasharleenUrgent
I
3

You can decompress incrementally. See Compression using the LZMA Algorithm. You create an LZMADecompressor object, and then use the decompress method with successive chunks of the compressed data to get successive chunks of the uncompressed data.

Incurable answered 22/2, 2015 at 7:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.