How to download an entire directory and subdirectories using wget?
Asked Answered
N

9

179

I am trying to download the files for a project using wget, as the SVN server for that project isn't running anymore and I am only able to access the files through a browser. The base URLs for all the files is the same like

http://abc.tamu.edu/projects/tzivi/repository/revisions/2/raw/tzivi/*

How can I use wget (or any other similar tool) to download all the files in this repository, where the "tzivi" folder is the root folder and there are several files and sub-folders (upto 2 or 3 levels) under it?

Notary answered 24/6, 2013 at 18:56 Comment(4)
You can't do that if server has no web-page with list of all links to files you need.Kelda
do you know the name of the files?Sleuthhound
no i don't know the name of all files.I tried wget with the recursive option but it didn't work either.Is that because the server doesn't have any index.html file which lists all the inner links.Notary
Did you try the mirroring option of wget?Cloutier
P
256

You may use this in shell:

wget -r --no-parent http://abc.tamu.edu/projects/tzivi/repository/revisions/2/raw/tzivi/

The Parameters are:

-r     //recursive Download

and

--no-parent // Don´t download something from the parent directory

If you don't want to download the entire content, you may use:

-l1 just download the directory (tzivi in your case)

-l2 download the directory and all level 1 subfolders ('tzivi/something' but not 'tivizi/somthing/foo')  

And so on. If you insert no -l option, wget will use -l 5 automatically.

If you insert a -l 0 you´ll download the whole Internet, because wget will follow every link it finds.

Palmitin answered 30/10, 2013 at 22:37 Comment(10)
Great, so to simplify for the next reader: wget -r -l1 --no-parent http://www.stanford.edu/~boyd/cvxbook/cvxbook_additional_exercises/ was the answer for me. Thanks your answer.Hysterics
I tried the above command to get all the files from http://websitename.com/wp-content/uploads/2009/05 but all I got was an index.html file which had nothing. I can't figure what I missed.Clemenciaclemency
@up: Note that wget follows links, so you need a directory listing ;)Palmitin
@greensn0w is it possible to use the same coding for also https links which requires username & password? Does it work if I just add username:[email protected]/etc... at the beginning of the code?Bicipital
I know this is quite old. But what I also found useful was the -e robots=off switch. ;)Tucana
@sn0w is -l 1 only supposed to download the target folder without subdirectories?Fish
It's really really important the last slash at the end of the URL, without it, it will download anything recursevilyImpressionism
Why don't you remove the "I forgot something important" and just fix the answer ???Shaynashayne
We can use -nH option with wget to prevent the hostname directory getting created by default with the download directory.Stalemate
In addition to -nH to prevent hostname directory being created, use --cut-dirs=x to cut the first x directories past the hostname from being created in the download directoryLukin
A
27

You can use this in a shell:

wget -r -nH --cut-dirs=7 --reject="index.html*" \
      http://abc.tamu.edu/projects/tzivi/repository/revisions/2/raw/tzivi/

The Parameters are:

-r recursively download

-nH (--no-host-directories) cuts out hostname 

--cut-dirs=X (cuts out X directories)
Alanalana answered 14/1, 2016 at 11:1 Comment(0)
D
11

This link just gave me the best answer:

$ wget --no-clobber --convert-links --random-wait -r -p --level 1 -E -e robots=off -U mozilla http://base.site/dir/

Worked like a charm.

Dissipate answered 4/2, 2020 at 22:56 Comment(1)
Where to use this code?Troubadour
A
5

use the command

wget -m www.ilanni.com/nexus/content/
Atalanti answered 19/7, 2016 at 8:46 Comment(0)
B
5
wget -r --no-parent URL --user=username --password=password

the last two options are optional if you have the username and password for downloading, otherwise no need to use them.

You can also see more options in the link https://www.howtogeek.com/281663/how-to-use-wget-the-ultimate-command-line-downloading-tool/

Bolster answered 5/9, 2018 at 11:28 Comment(0)
S
2

you can also use this command :

wget --mirror -pc --convert-links -P ./your-local-dir/ http://www.your-website.com

so that you get the exact mirror of the website you want to download

Swink answered 7/8, 2016 at 14:17 Comment(0)
C
1

try this working code (30-08-2021):

!wget --no-clobber --convert-links --random-wait -r -p --level 1 -E -e robots=off --adjust-extension -U mozilla "yourweb directory with in quotations"
Cetinje answered 30/8, 2021 at 14:59 Comment(0)
R
-1

This works:

wget -m -np -c --no-check-certificate -R "index.html*" "https://the-eye.eu/public/AudioBooks/Edgar%20Allan%20Poe%20-%2"
Roofdeck answered 4/5, 2018 at 4:59 Comment(0)
X
-1

This will help

wget -m -np -c --level 0 --no-check-certificate -R"index.html*"http://www.your-websitepage.com/dir
Xymenes answered 15/9, 2020 at 1:11 Comment(1)
A little description of your suggested answer will be more helpful. Please read stackoverflow.com/help/how-to-answerBegonia

© 2022 - 2024 — McMap. All rights reserved.