I am working on a project which lists file sharing urls from the likes of Oron, filespost, depositfiles etc that reports sharing of copyrighted materials to identified content owners and rights holders in my network.
To better improve the service, which currently stands at a table populated from MySQL database with some filters built in to the php, I want to be able to identify the links that have ceased to function.
My thoughts are that when the data is retrieved from the MySQL database the download URL column entries (the url to the file or file host page) will be checked to see if they link to the actual file sharing page that allows users to start the download process, if they are working and provide the ability to download the file they should be left, link text or the cell colour turned green, if the file site displays file not found or similar the link text or cell background colour should turn red.
At present there is no quick and easy visual representation of active or inactive links.
I have a simple validation on the url based on if a 404 error is received but quickly realised that won't work given that these sites don't 404 or redirect even, they change the dynamically generated page to say the file is not available or file has been removed etc.
I have also incorporated a link checker script that uses a third part file share link checking service but this would require manual checks and manual updating of the database.
I have also checked to see if I can find specific fields or words on the page, but the given the range of sites and the broader range of terms used on the sites this to has been proven to be accurate and difficult to implement on all links.
It would also be helpful if urls could then be filtered down based on the active status. I'm guessing if the colour change was managed by a link class or cell class style I could filter the column based on class eg: link-dead or link-active. I think I can do this so help with this last bit on filtering based on class is not necessarily required.
Any help would be greatly appreciated.