Recover old website off waybackmachine [closed]

About

Asked 16/3, 2012 at 1:1 Answered 16/3, 2012 at 1:8

Is there a way to recover an entire website from the waybackmachine?

I have an old site that is archived but no longer have the website files to revive it again. Is there a way to recover the old data so I can get my long lost files back?

Audiovisual answered 16/3, 2012 at 1:1 Comment(3)

What do you mean by 'website files' - just the html? If yes, then surely you could just go to that webpage and download the source from there through your browser. – Cystectomy 16/3, 2012 at 1:6

Yes, html, css, images, & possibly php files. This has multiple pages with images and custom css. – Audiovisual 16/3, 2012 at 1:58

I've came accross the same issue and I've ended up coding a gem. To install: gem install wayback_machine_downloader then run it with the base url of the website you want to retrieve as a parameter: wayback_machine_downloader http://example.com More information: github.com/hartator/wayback_machine_downloader – Thick 10/8, 2015 at 6:38

wget is a great tool to mirror an entire site and if you are on windows, you can use Cygwin to install it. The following command will mirror a site: wget -m domain.name

Update from comments:

The example wget command that the wont ascend to the parent dir (-np), ignores robot.txt (-e robots=off), uses the cdn domain (--domains=domain.name), and mirrors a url (the url to mirror, http://an.example.com ). All together you get:

 wget -np -e robots=off --mirror --domains=staticweb.archive.org,web.archive.org http://web.archive.org/web/19970708161549/http://www.google.com/

If you are dealing with https and a self signed cert, u can use --no-check-certificate to disable the certificate check. The wget help is the best place to see possible options.

Grappling answered 16/3, 2012 at 1:8 Comment(9)

Thank you for the resource, much appreciated. I have a mac and app called site sucker which seems to do the same thing. The problem is downloading through a full archive.org url. – Audiovisual 16/3, 2012 at 1:59

+ 1 for help with blocked recursive crawling! This should be approved answer. – Vena 18/7, 2012 at 11:20

-np helps to don't get off from the specified date path. – Viniferous 19/10, 2013 at 1:21

Great, thanks. And for a great guide to install wget on Mac OSX without homebrew or similar, checkout coolestguidesontheplanet.com/install-and-configure-wget-on-os-x – Remediosremedy 6/1, 2014 at 20:58

When using https add --no-check-certificate – Groats 13/2, 2014 at 16:51

Good stuff, I will update the example. – Grappling 13/2, 2014 at 18:1

@Grappling But is there any way to download the css and photos with that command? – Hawkinson 19/4, 2015 at 12:40

@Hawkinson you'll need to remove -np, and then it's a good idea to limit recursion, for example -l 3 – Slavey 17/1, 2019 at 11:27

Replying to @Hawkinson — no, you need a few more options, e.g.:

wget       --recursive       --no-clobber       --page-requisites       --html-extension       --convert-links       --restrict-file-names=windows       --domains domain.tld                my.domain.tld/

, take a look at linuxjournal.com/content/downloading-entire-web-site-wget (note: this will work for web.archive.org as well, just add the extra options) – Legendary 6/11, 2019 at 18:44

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Update from comments:

Recommended topics

Hot tags