Automated link-checker for system testing [closed]
Asked Answered
M

9

44

I often have to work with fragile legacy websites that break in unexpected ways when logic or configuration are updated.

I don't have the time or knowledge of the system needed to create a Selenium script. Besides, I don't want to check a specific use case - I want to verify every link and page on the site.

I would like to create an automated system test that will spider through a site and check for broken links and crashes. Ideally, there would be a tool that I could use to achieve this. It should have as many as possible of the following features, in descending order of priority:

  • Triggered via script
  • Does not require human interaction
  • Follows all links including anchor tags and links to CSS and js files
  • Produces a log of all found 404s, 500s etc.
  • Can be deployed locally to check sites on intranets
  • Supports cookie/form-based authentication
  • Free/Open source

There are many partial solutions out there, like FitNesse, Firefox's LinkChecker and the W3C link checker, but none of them do everything I need.

I would like to use this test with projects using a range of technologies and platforms, so the more portable the solution the better.

I realise this is no substitute for proper system testing, but it would be very useful if I had a convenient and automatable way of verifying that no part of the site was obviously broken.

Mehetabel answered 20/10, 2009 at 18:37 Comment(2)
Another great post closed because the community does not want opinionated answers.Soar
Odd that this question was closed as off-topic. I found the question to be valid and the answers meaningful.Rocketeer
R
28

I use Xenu's Link Sleuth for this sort of thing. Quickly check for no deadlinks etc. on a/any site. Just point it at any URI and it'll spider all links on that site.

Desription from site:

Xenu's Link Sleuth (TM) checks Web sites for broken links. Link verification is done on "normal" links, images, frames, plug-ins, backgrounds, local image maps, style sheets, scripts and java applets. It displays a continously updated list of URLs which you can sort by different criteria. A report can be produced at any time.

It meets all you're requirements apart from being scriptable as it's a windows app that requires manually starting.

Resurgent answered 31/10, 2009 at 20:27 Comment(4)
I have used this program and it works really well!Tophet
It's not open source, but it is free (it includes some advertising links in reports which I;ve always happily ignored.)Resurgent
Xenu's Link Sleuth's website says that operating the program for the command line is available for "a $300 donation" donated to a cause Tilman supports.Mehetabel
Xenu's Link Sleuth is a great tool. It only breaks when when you need different settings for different URLs... doing that is awkward.Macymad
C
33

We use and really like Linkchecker:

http://wummel.github.io/linkchecker/

It's open-source, Python, command-line, internally deployable, and outputs to a variety of formats. The developer has been very helpful when we've contacted him with issues.

We have a Ruby script that queries our database of internal websites, kicks off LinkChecker with appropriate parameters for each site, and parses the XML that LinkChecker gives us to create a custom error report for each site in our CMS.

Checkpoint answered 6/11, 2009 at 22:25 Comment(4)
Have used commercially on many, many projects. It is a very useful tool as it detects any 404's, can be set up to detect any specific text content on the page (in case you have custom error pages) and has an active community of developers and testers.Abort
I have used Xenu's Link Sleuth in the past, but according to the site it hasn't been maintained since 2010 home.snafu.de/tilman/xenulink.html#Download . So it misses things like img srcset links. This linkchecker is being maintained and keeps track of these 'new' links.Romalda
Is it possible to use Linkchecker for internal testing environments? - I've tried with a URL, but it doesn't check any links other than the homepage.Bevatron
The repository linked in the answer has been silently abandoned by its owner. There is an actively developed fork at github.com/linkcheck/linkcheckerMia
R
28

I use Xenu's Link Sleuth for this sort of thing. Quickly check for no deadlinks etc. on a/any site. Just point it at any URI and it'll spider all links on that site.

Desription from site:

Xenu's Link Sleuth (TM) checks Web sites for broken links. Link verification is done on "normal" links, images, frames, plug-ins, backgrounds, local image maps, style sheets, scripts and java applets. It displays a continously updated list of URLs which you can sort by different criteria. A report can be produced at any time.

It meets all you're requirements apart from being scriptable as it's a windows app that requires manually starting.

Resurgent answered 31/10, 2009 at 20:27 Comment(4)
I have used this program and it works really well!Tophet
It's not open source, but it is free (it includes some advertising links in reports which I;ve always happily ignored.)Resurgent
Xenu's Link Sleuth's website says that operating the program for the command line is available for "a $300 donation" donated to a cause Tilman supports.Mehetabel
Xenu's Link Sleuth is a great tool. It only breaks when when you need different settings for different URLs... doing that is awkward.Macymad
F
2

What part of your list does the W3C link checker not meet? That would be the one I would use.

Alternatively, twill (python-based) is an interesting little language for this kind of thing. It has a link checker module but I don't think it works recursively, so that's not so good for spidering. But you could modify it if you're comfortable with that. And I could be wrong, there might be a recursive option. Worth checking out, anyway.

Fecteau answered 31/10, 2009 at 20:18 Comment(2)
From a preliminary look the W3C Link Checker does the checking meaning it fails: "Can be deployed locally to check sites on intranets"Nye
@Adam: Not at all - there is a download link right at the bottom of the page linked to in the question! search.cpan.org/dist/W3C-LinkCheckerFecteau
B
2

You might want to try using wget for this. It can spider a site including the "page requisites" (i.e. files) and can be configured to log errors. I don't know if it will have enough information for you but it's Free and available on Windows (cygwin) as well as unix.

Benempt answered 2/11, 2009 at 19:1 Comment(0)
S
1

InSite is a commercial program that seems to do what you want (haven't used it).

If I was in your shoes, I'd probably write this sort of spider myself...

Shiva answered 31/10, 2009 at 13:58 Comment(2)
Writing it myself might be an option, but I'm surprised that there doesn't seem to be such a tool out there already. I would have thought it was a common need.Mehetabel
I agree, I was surprised when I looked for one after your question. I thought, "I could use something like this", but no cigar.Shiva
C
1

I'm not sure that it supports form authentication but it will handle cookies if you can get it going on the site and otherwise I think Checkbot will do everything on your list. I've used as a step in build process before to check that nothing broken on a site. There's an example output on the website.

Chalfant answered 2/11, 2009 at 19:29 Comment(0)
G
1

I have always liked linklint for checking links on a site. However, I don't think it meets all your criteria, particularly the aspects that may be JavaScript dependent. I also think it will miss the images called from inside CSS.

But for spidering all anchors, it works great.

Gerri answered 7/11, 2009 at 8:55 Comment(0)
L
0

Try SortSite. It's not free, but seems to do everything you need and more.

Alternatively, PowerMapper from the same company has a similar-but-different approach. The latter will give you less information about detailed optimisation of your pages, but will still identify any broken links, etc.

Disclaimer: I have a financial interest in the company that makes these products.

Lorislorita answered 7/11, 2009 at 10:41 Comment(0)
U
0

Try http://www.thelinkchecker.com it is an online application that checks number of outgoing links, page rank , anchor, number of outgoing links. I think this is the solution you need.

Ungula answered 18/1, 2014 at 22:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.