Download all files in a path on Jupyter notebook server
Asked Answered
G

11

101

As a user in a class that runs Jupyter notebooks for assignments, I have access to the assignments via the web interface. I assume the assignments are stored somewhere in my personal space on the server, and so I should be able to download them. How can I download all files that are in my personal user space? (e.g., wget)

Here's the path structure:

https://urltoserver/user/username

There are several directories: assignments, data, etc.

https://urltoserver/user/username/assignments

https://urltoserver/user/username/data

...

I want to download all the folders (recursively). Just enough that I can launch whatever I see online locally. If there are some forbidden folders, then ok, skip those and download the rest.

Please specify the command exactly as I couldn't figure it out myself (I tried wget)

Garble answered 27/3, 2017 at 9:35 Comment(2)
I think it would be extremely useful if we can select multiple files and click "download" to get them all. However I think this is not supported by Jupyter notebook yet.Busyness
@Busyness If you install the extension 'jupyter-archive', you can do that directly for an individual directory now in JupyterLab. See jupyter-archive. There's an animation that shows the option that gets added to the drop-down menu for JupyterLab.Tyrannize
C
242

Try running this as separate cell in one of your notebooks:

!tar chvfz notebook.tar.gz *

If you want to cover more folders up the tree, write ../ before the * for every step up the directory. The file notebook.tar.gz will be saved in the same folder as your notebook.

Casteel answered 17/11, 2017 at 17:0 Comment(10)
This works well. Thanks. But... my resulting tarball is sitting on my ec2 instance on AWS. What is the easiest way to get it from there to my local machine?Barrens
From within the Jupyter notebook go to File -> Open. This will open up a new browser tab. From there click the checkbox next to your fresh tar.gz. and a 'download' button will appear at the top. Click it, specify local path and save.Hortensiahorter
FYI, !tar chvfz notebook.tar.hz * will pull in files that are symbolic links as well, so you won't have broken images.Fenugreek
The problem is that it cannot fetch files in alias.Squab
if it's not following the link files, use h option of tar commandUnderwrite
Probably a stupid question, but how can i open this on my local maschine again in my local juypter notebook?Explode
@Explode I'm guessing you saved your notebook.tar.gz somewhere on your machine and extracted the ipynb in /some/directory. Then just follow the instructions here: jupyter-notebook-beginner-guide.readthedocs.io/en/latest/…Casteel
how to handle it, if the file excee 250MB? is there a automatic splitter for it?Explode
@Explode in that case I suggest you formulate your question in full and post in on SOCasteel
To decompress once downloaded to local: tar -zxvf notebook.tar.gzDecompensation
S
51

I am taking Prof. Andrew Ng's Deeplearning.ai program via Coursera. The curriculum uses Jupyter Notebooks online. Along with the notebooks are folders with large files. Here's what I used to successfully download all assignments with the associated files and folders to my local Windows 10 PC.

Start with the following line of code as suggested in the post by Serzan Akhmetov above:

!tar cvfz allfiles.tar.gz *

This produces a tarball which, if small enough, can be downloaded from the Jupyter notebook itself and unzipped using 7-Zip. However, this course has individual files of size 100's of MB and folders with 100's of sample images. The resulting tarball is too large to download via browser.

So add one more line of code to split files into manageable chunk sizes as follows:

!split -b 50m allfiles.tar.gz allfiles.tar.gz.part.

This will split the archive into multiple parts each of size 50 Mb (or your preferred size setting). Each part will have an extension like allfiles.tar.gz.part.xx. Download each part as before.

The final task is to untar the multi-part archive. This is very simple with 7-Zip. Just select the first file in the series for extraction with 7-Zip. This is the file named allfiles.tar.gz.part.aa for the example used. It will pull all the necessary parts together as long as they are in the same folder.

Hope this helps add to Serzan's excellent answer above.

Swear answered 27/5, 2019 at 1:58 Comment(3)
For those who don't want to use 7-Zip you can stay in unix/linux and use !cat allfiles* > your_file_name.gz. This combines everything starting with allfiles in the directory into one filePtisan
But this seem to only create the tar on the remote server, how to download them to the local PC?Maice
right-clicking on the tar file will give a set of options including download @ChristyLeeCopycat
L
21

You can create a new terminal from the "New" menu and call the command described on https://mcmap.net/q/210530/-download-all-files-in-a-path-on-jupyter-notebook-server:

tar cvfz notebook.tar.gz *

The file notebook.tar.gz will be saved in the same folder as your notebook.

Lazo answered 22/2, 2018 at 1:43 Comment(1)
Change cvfz to chvfz will download files that are symbolic links.Pozzuoli
G
3

you just need to do

zip -r filename.zip folder_name
Gallous answered 16/5, 2022 at 12:54 Comment(2)
Why do you prefer this over the tar based commands suggested previously, which have been validated repeatedly by the community? Can you edit your answer to provide an explanation of when this might be more appropriate?Benzene
I always use this method, so guess therefore, it is a proven method!Giglio
A
2

The easiest way is to archive all content using tar, but there is also an API for files downloading.

GET /files/_FILE_PATH_

To get all files in folder you can use:

GET /api/contents/work

Example:

curl https://server/api/contents?token=your_token
curl https://server/files/path/to/file.txt?token=your_token --output some.file

Source: Jupyter Docs

Aetolia answered 10/3, 2019 at 0:46 Comment(0)
S
1

Try first to get the directory by:

import os
os.getcwd()

And then use snipped from How to create a zip archive of a directory. You can download complete directory by zipping it. Good luck!

Sordello answered 16/2, 2018 at 12:58 Comment(0)
H
1
from google.colab import files

files.download("/content/data.txt")

These lines might work if you are working in a google colab or Jupyter notebook.

The first line imports the library files The second one, downloads your created file, example:"data.txt" (your file name) located inside content folder.

Hailey answered 15/10, 2020 at 5:17 Comment(2)
Although this code might solve the problem, a good answer should also explain what the code does and how it helps.Tremulous
This made me laugh.Unready
B
0

I don't think this is possible with wget, even with the wget -r option. You may have to download them individually (using the Download option in the dashboard view (which is only available on single, non-directory, non-running notebook items) if that is available to you.

However, it is likely that you are not able to download them since if your teacher is using grading software like nbgrader then the students having access to the notebooks themselves is undesirable - since the notebooks can contain information about the answers as well.

Bryannabryansk answered 27/3, 2017 at 10:9 Comment(0)
S
0

I've made a slightly update based on @Sun Bee's solution, and it will allow you to create multiple file backup with a timestamp subfix.

!tar cvfz allfiles-`date +"%Y%m%d-%H%M"`.tar.gz *
Smegma answered 1/8, 2021 at 3:39 Comment(0)
M
0

The above solution didn't work for me (Windows). When I extracted the tar.gz file, it produced a single file and not the folder hierarchy you see in jupyter. The following links should help:

  1. Coursera support - link
  2. Youtube vid - link
Malevolent answered 14/2, 2023 at 18:23 Comment(1)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Wheelbarrow
B
0

Since this question was asked, Coursera has added a Lab Files tab to some of the notebooks, which makes it trivial to download everything by clicking Download all files. Check this first to save yourself some hassle.

screenshot of the Lab Files tab

Instructions from Coursera:

instructions to download the workspace

Bigotry answered 9/8, 2023 at 20:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.