How can I make HTTrack only download files on the current domain?
Asked Answered
E

3

13

No matter how hard I try, I can't seem to get httrack to leave links going to other domains intact. I've tried using the --stay-on-same-domain argument, and that doesn't seem to do it. I've also tried adding a filter doesn't do it.

There simply must be some option I'm missing here.

Encumbrance answered 2/5, 2014 at 5:49 Comment(0)
T
15

Setting the option "Maximum external depth" to 0 did not work , even though it should be expected.

What works:

Go to > Options > Scan Rules and enter in the text field (extra line): -* +*yourdomain.com/*

Here are more settings to learn about: HTTrack: How to download folders only from a certain subfolder level?

Tonneau answered 28/9, 2017 at 9:53 Comment(2)
How frustrating to have to manually specify the domain in the scan rules each time. πŸ€¦β€β™‚οΈ It should really detect that. – Twinkling
When I did this it reduced the number of pages downloaded from other domains - but strangely not to zero. Some pages were still downloaded from other domains. – Subaquatic
H
1

Set maximum external depth to 0. In the GUI that this can be found here:

enter image description here

If you are using the command line version, the option is

%e0

[Note: not an expert on HTTRACK, so please correct if necessary]

Hypoderma answered 21/5, 2015 at 13:49 Comment(1)
This doesn't always work. I have my settings the same as your screenshot and yet I also get many many pages from Wikipedia. πŸ˜’ – Twinkling
V
-2

In "Set Option" > "Limits", try

Maximum mirroring depth = 1 (Keep this 2, when 1 doesn't work)

And

Maximum external depth = 0

Worked for me!!

Vassal answered 24/6, 2018 at 15:6 Comment(0)

© 2022 - 2024 β€” McMap. All rights reserved.