How to download all files and folder hierarchy from Jupyter Notebook?
Asked Answered
M

3

17

If I want to download all of the files and folder hierarchy from Jupyter Notebook as shown in the picture, do you know if there is anyway to do that by simple click other than go to every single file in every folder to open the file and click download hundreds of times?

enter image description here

Note: This Jupyter Notebook is created by the online course teacher, so it's not opened from my local Acaconda app but the online course webpage instead. Downloading is for future memory refreshing whenever needed.

Mescal answered 5/1, 2018 at 23:32 Comment(0)
M
37
import os
import tarfile

def recursive_files(dir_name='.', ignore=None):
    for dir_name,subdirs,files in os.walk(dir_name):
        if ignore and os.path.basename(dir_name) in ignore: 
            continue

        for file_name in files:
            if ignore and file_name in ignore:
                continue

            yield os.path.join(dir_name, file_name)

def make_tar_file(dir_name='.', tar_file_name='tarfile.tar', ignore=None):
    tar = tarfile.open(tar_file_name, 'w')

    for file_name in recursive_files(dir_name, ignore):
        tar.add(file_name)

    tar.close()


dir_name = '.'
tar_file_name = 'archive.tar'
ignore = {'.ipynb_checkpoints', '__pycache__', tar_file_name}
make_tar_file(dir_name, tar_file_name, ignore)

To use that, just create a new .ipynb notebook at the root folder, the one that you want to download. Then copy and paste the code above in the first cell and run it.

When it is done - you will see a tar file created in the same folder, which contains all the files and subfolders.

Mescal answered 7/1, 2018 at 21:26 Comment(1)
+1 If you want to make sure that symlinks are resolved, then use: yield os.path.realpath(os.path.join(dir_name, file_name)) instead of simply: os.path.join(dir_name, file_name)Xeric
M
7

The above posted answer mostly works but its copying links instead of the files the links point to. If you add dereference=True as an argument to tarfile.open you will get the files themselves.

    tar = tarfile.open(tar_file_name, 'w', dereference=True)
Mosque answered 25/8, 2018 at 15:49 Comment(0)
C
0

The above code also works well, but can cause problems if the file path is longer than 100 characters. (Python3.8 and later uses the POSIX standard's tar by default, which is known to have limitations in handling large files, long pathnames, and relative paths. https://docs.python.org/3/library/tarfile.html#tarfile.DEFAULT_FORMAT)

tar = tarfile.open(tar_file_name, 'w', dereference=True, format=tarfile.GNU_FORMAT)

(> Rhombus's answer includes)

By changing to the above code, you can use the GNU tar method, which is slightly less compatible but supports both large files and long pathnames.

Chancellorship answered 7/7, 2023 at 8:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.