How to decompress lzma2 (.xz) and zstd (.zst) files into a folder using Python 3?
Asked Answered
G

1

10

I have been working for a long time with .bz2 files. To unpack/decompress .bz2 files into a specific folder I have been using the following function:

destination_folder = 'unpacked/'
def decompress_bz2_to_folder(input_file):
    unpackedfile = bz2.BZ2File(input_file)
    data = unpackedfile.read()
    open(destination_folder, 'wb').write(data)

Recently I obtained a list of files with the .xz (not .tar.xz) and .zst extensions. My poor research skills told me that the former is lzma2 compression and the latter is Zstandard.

However, I couldn't find of an easy way to unpack the contents of these archives into a folder (like I do with the .bz2 files).

How can I:

  1. Unpack the contents of an .xz (lzma2) file into a folder using Python 3?
  2. Unpack the contents of a .zst (Zstandard) file into a folder using Python 3?

Important Note: I'm unpacking very large files, so it would be great if the solution takes into consideration any potential Memory Errors.

Gauze answered 15/3, 2019 at 13:59 Comment(3)
The zstd cli can decompress both .xz and .zst files, if built with appropriate options. This can be checked with zstd -vV. Example : zstd -vV, *** zstd command line interface 64-bits v1.3.2, by Yann Collet ***, *** supports: zstd, zstd legacy v0.4+, gzip, lz4, lzma, xzLoudish
@Loudish That's good to know. How can it be done in Python 3 though? :)Gauze
By invoking the CLI as an external command line utility ? If you have to use tighter integration instead, you may be interested in a python wrapper.Loudish
A
12

The LZMA data can be decompressed using the lzma module, simply open the file with that module, then use shutil.copyfileobj() to efficiently copy the decompressed data to an output file without running into memory issues:

import lzma
import pathlib
import shutil

def decompress_lzma_to_folder(input_file):
    input_file = pathlib.Path(input_file)
    with lzma.open(input_file) as compressed:
        output_path = pathlib.Path(destination_dir) / input_file.stem
        with open(output_path, 'wb') as destination:
            shutil.copyfileobj(compressed, destination)
        

The Python standard library doesn't have any support for Zstandard compression yet, you can use either the zstandard (by IndyGreg from Mozilla and the Mercurial project) or zstd; the latter is perhaps too basic for your needs, while zstandard offers a streaming API specifically suited for reading files.

I'm using the zstandard library here to benefit from the copying API it implements, which lets you decompress and copy at the same time, similar to how shutil.copyfileobj() works:

import zstandard
import pathlib

def decompress_zstandard_to_folder(input_file):
    input_file = pathlib.Path(input_file)
    with open(input_file, 'rb') as compressed:
        decomp = zstandard.ZstdDecompressor()
        output_path = pathlib.Path(destination_dir) / input_file.stem
        with open(output_path, 'wb') as destination:
            decomp.copy_stream(compressed, destination)
Arturoartus answered 20/3, 2019 at 12:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.