I am creating a new web crawler using C#
to crawl some specific websites. every thing goes fine. but the problem is that some websites are blocking my crawler IP address after some requests. I tried using timestamps between my crawl requests. but did not worked.
is there any way to prevent websites from blocking my crawler ? some solutions like this would help (but I need to know how to apply them):
- simulating Google bot or yahoo slurp
- using multiple IP addresses (event fake IP addresses) as crawler client IP
any solution would help.