How can I create a .tar.gz file with compression in Python?
To build a .tar.gz
(aka .tgz
) for an entire directory tree:
import tarfile
import os.path
def make_tarfile(output_filename, source_dir):
with tarfile.open(output_filename, "w:gz") as tar:
tar.add(source_dir, arcname=os.path.basename(source_dir))
This will create a gzipped tar archive containing a single top-level folder with the same name and contents as source_dir
.
arcname=os.path.basename(source_dir)
then it'll give you the entire path structure of source_dir
in the tar file (in most situations, that's probably inconvenient). –
Midrib arcname=os.path.basename(source_dir)
still means that the archive contains a folder which contains the contents of source_dir
. If you want the root of the archive to contain the contents themselves, and not contents within a folder, use arcname=os.path.sep
instead. –
Nipissing os.path.sep
, then the archive will contain service "." or "/" folder which is not a problem usually, but sometimes it can be an issue if you later process this archive programmatically. It seems the only real clean way is to do os.walk
and add files individually –
Truly arcname='.'
. No need to use os.walk
. –
Reboant format
you select (or don't select, as the case may be); the default before Python 3.8 was GNU_FORMAT
which may not be readable by all tools, though the default as of Python 3.8 is PAX_FORMAT
which is a standard format and also a conservative/compatible extension of the much older standard USTAR_FORMAT
which is widely supported –
Overthrow import tarfile
tar = tarfile.open("sample.tar.gz", "w:gz")
for name in ["file1", "file2", "file3"]:
tar.add(name)
tar.close()
If you want to create a tar.bz2 compressed file, just replace file extension name with ".tar.bz2" and "w:gz" with "w:bz2".
with tarfile.open( ..
in Python, instead of calling open
and close
manually. This is also the case when opening regular files. –
Nipissing You call tarfile.open with mode='w:gz'
, meaning "Open for gzip compressed writing."
You'll probably want to end the filename (the name
argument to open
) with .tar.gz
, but that doesn't affect compression abilities.
BTW, you usually get better compression with a mode of 'w:bz2'
, just like tar
can usually compress even better with bzip2
than it can compress with gzip
.
Previous answers advise using the tarfile
Python module for creating a .tar.gz
file in Python. That's obviously a good and Python-style solution, but it has serious drawback in speed of the archiving. This question mentions that tarfile
is approximately two times slower than the tar
utility in Linux. According to my experience this estimation is pretty correct.
So for faster archiving you can use the tar
command using subprocess
module:
subprocess.call(['tar', '-czf', output_filename, file_to_archive])
shutil.make_archive is very convenient for both files and directories (contents recursively added to the archive):
import shutil
compressed_file = shutil.make_archive(
base_name='archive', # archive file name w/o extension
format='gztar', # available formats: zip, gztar, bztar, xztar, tar
root_dir='path/to/dir' # directory to compress
)
In addition to @Aleksandr Tukallo's answer, you could also obtain the output and error message (if occurs). Compressing a folder using tar
is explained pretty well on the following answer.
import traceback
import subprocess
try:
cmd = ['tar', 'czfj', output_filename, file_to_archive]
output = subprocess.check_output(cmd).decode("utf-8").strip()
print(output)
except Exception:
print(f"E: {traceback.format_exc()}")
In this tar.gz file compress in open view directory In solve use os.path.basename(file_directory)
import tarfile
with tarfile.open("save.tar.gz","w:gz") as tar:
for file in ["a.txt","b.log","c.png"]:
tar.add(os.path.basename(file))
its use in tar.gz file compress in directory
Minor correction to @THAVASI.T's answer which omits showing the import of the 'tarfile' library, and does not define the 'tar' object which is used in the third line.
import tarfile
with tarfile.open("save.tar.gz","w:gz") as tar:
for file in ["a.txt","b.log","c.png"]:
tar.add(os.path.basename(file))
I am using this to generate tar.gz file without containing the main folder.
import tarfile
import os.path
source_location = r'C:\Users\username\Desktop\New folder'
output_name = r'C:\Users\username\Desktop\new.tar.gz'
# ---------------------------------------------------
# --- output new.tar.gz with 'New folder' inside ---
# -> new.tar.gz/New folder/aaaa/a.txt
# -> new.tar.gz/New folder/bbbb/b.txt
# ---------------------------------------------------
# def make_tarfile(output_filename, source_dir):
# with tarfile.open(output_filename, "w:gz") as tar:
# # tar.add(source_dir, arcname=os.path.basename(source_dir))
# tar.add(source_dir, arcname=os.path.sep(source_dir))
# ---------------------------------------------------
# --- output new.tar.gz without 'New folder' inside ---
# -> new.tar.gz/aaaa/a.txt
# -> new.tar.gz/bbbb/b.txt
# ---------------------------------------------------
def make_tarfile(output_filename, source_dir):
with tarfile.open(output_filename, "w:gz") as tar:
for root, _, files in os.walk(source_dir):
for file in files:
file_path = os.path.join(root, file)
arcname = os.path.relpath(file_path, source_dir)
tar.add(file_path, arcname=arcname)
try:
make_tarfile(output_name, source_location)
except Exception as e:
print(f"Error: {e}")
Just restating @George V. Reilly 's excellent answer, but in a clearer form...
import tarfile
fd_path="/some/folder/path/"
fl_name="some_file_name.ext"
targz_fd_path_n_fl_name="/some/folder/path/some_file_name.tar.gz"
with tarfile.open(targz_fd_path_n_fl_name, "w:gz") as tar:
tar.add(fd_path + fl_name, fl_name)
As @Brōtsyorfuzthrāx pointed out (but in another way) if you leave the "add" method second argument then it'll give you the entire path structure of fd_path + fl_name
in the tar file.
Of course you can use...
import tarfile
import os
fd_path_n_fl_name="/some/folder/path/some_file_name.ext"
targz_fd_path_n_fl_name="/some/folder/path/some_file_name.tar.gz"
with tarfile.open(targz_fd_path_n_fl_name, "w:gz") as tar:
tar.add(fd_path_n_fl_name, os.path.basename(fd_path_n_fl_name))
... if you don't want to use or don't have the folder path and file name separated.
Thanks!🤗
best performance and without the .
and ..
in compressed file! See vulnerability warning below:
NOTICE (thanks MaxTruxa):
this answer is vulnerable to shell injections. Please read the security considerations from the docs. Never pass unescaped strings to
subprocess.run
,subprocess.call
, etc. ifshell=True
. Useshlex.quote
to escape (Unix shells only).I'm using it locally - so it's good for my needs.
subprocess.call(f'tar -cvzf {output_filename} *', cwd=source_dir, shell=True)
the cwd
argument changes directory before compressing - which solves the issue with the dots.
the shell=True
allows wildcard usage (*
)
WORKS also for a directory recursively
subprocess.run
, subprocess.call
, etc. if shell=True
. Use shlex.quote
to escape (Unix shells only). –
Mima © 2022 - 2024 — McMap. All rights reserved.