Use httrack to download just one site, not external sites
Asked Answered
T

4

7

I tried using httrack to download my phpbb forum, but no matter what setup I use, I cannot get it to stop downloading the entire wikipedia site as well, and many other websites whose links are anywhere in the forum...

What I managed to do it make it download the index page only, but that's not good either.

I thought that setting

+forum.mysite.com/*

in the Options->Scan Rules would do the trick, but it went on to download the entire wikipedia again :(

Tachistoscope answered 13/12, 2016 at 18:12 Comment(2)
Possible duplicate of How can I make HTTrack only download files on the current domain?Funicle
For once, I left the options untouched when adding a new domain, and it worked. It seems that once I start fiddling with them something breaks, and downloads the whole internet.Manila
T
1

Found a questionable solution here: Subject: Re: prevent download of external content.

The problem is that now external links point to a page that looks pretty ugly, which is fixable.

However, embedded content, like youtube, is now also replaced by this ugly page :(

At least it is not downloading the entire internet anymore...

Tachistoscope answered 14/12, 2016 at 12:51 Comment(0)
S
1

I would try:

-a
    *stay on the same address (--stay-on-same-address)
-d
    stay on the same principal domain (--stay-on-same-domain)
Scopophilia answered 31/7, 2017 at 16:56 Comment(2)
What are the GUI equivalents?Manila
Check the "Experts only" tab of the dialog that opens from the "Set options..." button.Scopophilia
C
1

Try

Maximum mirroring depth = 1 (Keep this 2, when 1 doesn't work)

And

Maximum external depth = 0 !! Worked for me

Collyrium answered 24/6, 2018 at 15:9 Comment(0)
Q
-1

For gui version. Set exceptions in the filters for all downloaded sites that you don't need, their names can be copied from the download folder. For example:

*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar
+meNeedSite.com/* +forum.mysite.com/*
-meNotNeedSite.com/* -fiu-vro.wikipedia.org/* -fj.wikipedia.org/* -fo.wikipedia.org/* -fonts.googleapis.com/* -fonts.gstatic.com/* -foundation.mozilla.org/* -fr.wikipedia.org/* -frr.wikipedia.org/* -ftp.mozilla.org/* -fur.wikipedia.org/* -fy.wikipedia.org/* -ga.wikipedia.org/* -gd.wikipedia.org/* -gl.wikipedia.org/* -glk.wikipedia.org/* -gmpg.org/* -gn.wikipedia.org/* -ha.wikipedia.org/* -hacks.mozilla.org/* -he.wikipedia.org/* -hi.wikipedia.org/* -hr.wikipedia.org/* -hsb.wikipedia.org/* -hu.wikipedia.org/* -human.spbstu.ru/* -hy.wikipedia.org/* -hyw.wikipedia.org/* -ia.wikipedia.org/* -id.google.com/* -id.wikipedia.org/* -ie.wikipedia.org/* -ilo.wikipedia.org/* -images.ctfassets.net/* -is.wikipedia.org/* -it.wikipedia.org/* -ja.wikipedia.org/* -jv.wikipedia.org/* -ka.wikipedia.org/* -kab.wikipedia.org/* -kk.wikipedia.org/* -kn.wikipedia.org/* -ko.wikipedia.org/* -krc.wikipedia.org/* -ks.wikipedia.org/* -ku.wikipedia.org/* -ky.wikipedia.org/* -la.wikipedia.org/* -labs.mozilla.org/* -lad.wikipedia.org/* -lb.wikipedia.org/* -learning.mozilla.org/* -lez.wikipedia.org/* -lij.wikipedia.org/* -lmo.wikipedia.org/* -ln.wikipedia.org/* -lo.wikipedia.org/* -lt.wikipedia.org/* -lv.wikipedia.org/*
Quebec answered 13/1, 2022 at 15:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.