How to create full compressed tar file using Python?
Asked Answered
P

11

184

How can I create a .tar.gz file with compression in Python?

Pending answered 9/1, 2010 at 4:59 Comment(1)
tar doesn't compress data, it just packs the files together. It's gzip that does the actual compression.Clothilde
W
310

To build a .tar.gz (aka .tgz) for an entire directory tree:

import tarfile
import os.path

def make_tarfile(output_filename, source_dir):
    with tarfile.open(output_filename, "w:gz") as tar:
        tar.add(source_dir, arcname=os.path.basename(source_dir))

This will create a gzipped tar archive containing a single top-level folder with the same name and contents as source_dir.

Whiteside answered 13/6, 2013 at 6:58 Comment(6)
Just as a note to readers, if you leave out arcname=os.path.basename(source_dir) then it'll give you the entire path structure of source_dir in the tar file (in most situations, that's probably inconvenient).Midrib
A second note; using arcname=os.path.basename(source_dir) still means that the archive contains a folder which contains the contents of source_dir. If you want the root of the archive to contain the contents themselves, and not contents within a folder, use arcname=os.path.sep instead.Nipissing
@Sheljohn unfortunately, this is not fully correct, because if one uses os.path.sep, then the archive will contain service "." or "/" folder which is not a problem usually, but sometimes it can be an issue if you later process this archive programmatically. It seems the only real clean way is to do os.walk and add files individuallyTruly
To get rid of all the directory structure, just use arcname='.'. No need to use os.walk.Reboant
If I generate this tarfile on Linux, will this open successfully on other platforms say, Windows & Mac?Jolenejolenta
@Jolenejolenta creating it on Linux is not important, what will matter is what format you select (or don't select, as the case may be); the default before Python 3.8 was GNU_FORMAT which may not be readable by all tools, though the default as of Python 3.8 is PAX_FORMAT which is a standard format and also a conservative/compatible extension of the much older standard USTAR_FORMAT which is widely supportedOverthrow
P
114
import tarfile
tar = tarfile.open("sample.tar.gz", "w:gz")
for name in ["file1", "file2", "file3"]:
    tar.add(name)
tar.close()

If you want to create a tar.bz2 compressed file, just replace file extension name with ".tar.bz2" and "w:gz" with "w:bz2".

Perkin answered 9/1, 2010 at 5:17 Comment(2)
You should really use with tarfile.open( .. in Python, instead of calling open and close manually. This is also the case when opening regular files.Nipissing
@Perkin I just want to compress to sample.gz. import tarfile tar = tarfile.open("sample.gz", "r:gz") for name in ["file1", "file2", "file3"]: tar.add(name) tar.close() It's Ok?Unitarianism
A
35

You call tarfile.open with mode='w:gz', meaning "Open for gzip compressed writing."

You'll probably want to end the filename (the name argument to open) with .tar.gz, but that doesn't affect compression abilities.

BTW, you usually get better compression with a mode of 'w:bz2', just like tar can usually compress even better with bzip2 than it can compress with gzip.

Alastair answered 9/1, 2010 at 5:19 Comment(1)
Just a quick note that the filename for bzip2-compressed tarballs should end with ".tar.bz2".Clothilde
B
25

Previous answers advise using the tarfile Python module for creating a .tar.gz file in Python. That's obviously a good and Python-style solution, but it has serious drawback in speed of the archiving. This question mentions that tarfile is approximately two times slower than the tar utility in Linux. According to my experience this estimation is pretty correct.

So for faster archiving you can use the tar command using subprocess module:

subprocess.call(['tar', '-czf', output_filename, file_to_archive])
Brunelleschi answered 19/7, 2019 at 11:55 Comment(1)
To get significant speedups for very large tarballs on multicore machines, you can invoke an external parallel compressor like pigz or lbzip2: subprocess.check_call(("tar", "-I", "lbzip2 --fast", "-cf", output_filename, file_to_archive))Whiteside
G
7

shutil.make_archive is very convenient for both files and directories (contents recursively added to the archive):

import shutil

compressed_file = shutil.make_archive(
        base_name='archive',   # archive file name w/o extension
        format='gztar',        # available formats: zip, gztar, bztar, xztar, tar
        root_dir='path/to/dir' # directory to compress
)
Gigahertz answered 26/7, 2022 at 9:41 Comment(0)
M
3

In addition to @Aleksandr Tukallo's answer, you could also obtain the output and error message (if occurs). Compressing a folder using tar is explained pretty well on the following answer.

import traceback
import subprocess

try:
    cmd = ['tar', 'czfj', output_filename, file_to_archive]
    output = subprocess.check_output(cmd).decode("utf-8").strip() 
    print(output)          
except Exception:       
    print(f"E: {traceback.format_exc()}")       
Marbut answered 15/3, 2020 at 22:42 Comment(0)
O
2

In this tar.gz file compress in open view directory In solve use os.path.basename(file_directory)

import tarfile

with tarfile.open("save.tar.gz","w:gz") as tar:
      for file in ["a.txt","b.log","c.png"]:
           tar.add(os.path.basename(file))

its use in tar.gz file compress in directory

Onfroi answered 8/9, 2019 at 17:42 Comment(1)
import tarfile packageOnfroi
T
0

Minor correction to @THAVASI.T's answer which omits showing the import of the 'tarfile' library, and does not define the 'tar' object which is used in the third line.

import tarfile

with tarfile.open("save.tar.gz","w:gz") as tar:
    for file in ["a.txt","b.log","c.png"]:
        tar.add(os.path.basename(file))
Terrify answered 4/5, 2021 at 1:43 Comment(1)
You should consider expanding this answer to include detail about what was wrong with the other answer and explain why this snippet works.Clergy
P
0

I am using this to generate tar.gz file without containing the main folder.

import tarfile
import os.path

source_location = r'C:\Users\username\Desktop\New folder'
output_name = r'C:\Users\username\Desktop\new.tar.gz'

# ---------------------------------------------------
#  --- output new.tar.gz with 'New folder' inside ---
#  -> new.tar.gz/New folder/aaaa/a.txt 
#  -> new.tar.gz/New folder/bbbb/b.txt
# ---------------------------------------------------
# def make_tarfile(output_filename, source_dir):
#     with tarfile.open(output_filename, "w:gz") as tar:
#         # tar.add(source_dir, arcname=os.path.basename(source_dir))
#         tar.add(source_dir, arcname=os.path.sep(source_dir))


# ---------------------------------------------------
#  --- output new.tar.gz without 'New folder' inside ---
#  -> new.tar.gz/aaaa/a.txt 
#  -> new.tar.gz/bbbb/b.txt
# ---------------------------------------------------
def make_tarfile(output_filename, source_dir):
    with tarfile.open(output_filename, "w:gz") as tar:
        for root, _, files in os.walk(source_dir):
            for file in files:
                file_path = os.path.join(root, file)
                arcname = os.path.relpath(file_path, source_dir)
                tar.add(file_path, arcname=arcname)

try:
    make_tarfile(output_name, source_location)

except Exception as e:
    print(f"Error: {e}")
Potpie answered 12/3 at 2:8 Comment(0)
D
-1

Just restating @George V. Reilly 's excellent answer, but in a clearer form...

import tarfile


fd_path="/some/folder/path/"
fl_name="some_file_name.ext"
targz_fd_path_n_fl_name="/some/folder/path/some_file_name.tar.gz"

with tarfile.open(targz_fd_path_n_fl_name, "w:gz") as tar:
    tar.add(fd_path + fl_name, fl_name)

As @Brōtsyorfuzthrāx pointed out (but in another way) if you leave the "add" method second argument then it'll give you the entire path structure of fd_path + fl_name in the tar file.

Of course you can use...

import tarfile
import os

fd_path_n_fl_name="/some/folder/path/some_file_name.ext"
targz_fd_path_n_fl_name="/some/folder/path/some_file_name.tar.gz"

with tarfile.open(targz_fd_path_n_fl_name, "w:gz") as tar:
    tar.add(fd_path_n_fl_name, os.path.basename(fd_path_n_fl_name))

... if you don't want to use or don't have the folder path and file name separated.

Thanks!🤗

Delbert answered 1/1, 2023 at 23:57 Comment(0)
R
-5

best performance and without the . and .. in compressed file! See vulnerability warning below:

NOTICE (thanks MaxTruxa):

this answer is vulnerable to shell injections. Please read the security considerations from the docs. Never pass unescaped strings to subprocess.run, subprocess.call, etc. if shell=True. Use shlex.quote to escape (Unix shells only).

I'm using it locally - so it's good for my needs.

subprocess.call(f'tar -cvzf {output_filename} *', cwd=source_dir, shell=True)

the cwd argument changes directory before compressing - which solves the issue with the dots.

the shell=True allows wildcard usage (*)

WORKS also for a directory recursively

Rupert answered 25/8, 2021 at 14:5 Comment(6)
The "perfect answer" is vulnerable to shell injections. Please read the security considerations from the docs. Never pass unescaped strings to subprocess.run, subprocess.call, etc. if shell=True. Use shlex.quote to escape (Unix shells only).Mima
Thanks @MaxTruxa for the important information..Rupert
I'm keeping getting downvotes for this - but I can't delete this answer since it was very hard to get it working perfectly - and for local usage (not a deployed script) it's 100% safe - I really believe it will help me in the future!Rupert
I would like you to reduce the font size. Since it's not a Perfect answer.Ransome
@Ransome , ok I got it. Better now?Rupert
@Rupert Thank you for your response. It hurts me too when I get negative votes. let's do our best.Ransome

© 2022 - 2024 — McMap. All rights reserved.