django URLValidator produced bogus errors
Asked Answered
S

1

9

I'm using the Django URLValidator in the following way in a form:

def clean_url(self):
    validate = URLValidator(verify_exists=True)
    url = self.cleaned_data.get('url')

    try:
        logger.info(url)
        validate(url)
    except ValidationError, e:
        logger.info(e)
        raise forms.ValidationError("That website does not exist. Please try again.")

    return self.cleaned_data.get('url')

It seems to work with some url's but for some valid ones, it fails. I was able to check with http://www.amazon.com/ it's failing (which is obviously incorrect). It passes with http://www.cisco.com/. Is there any reason for the bogus errors?

Sec answered 13/8, 2012 at 18:36 Comment(0)
I
8

Look at the source for URLValidator; if you specify check_exists, it makes a HEAD request to the URL to check if it's valid:

req = urllib2.Request(url, None, headers)
req.get_method = lambda: 'HEAD'
...
opener.open(req, timeout=10)

Try making the HEAD request to Amazon yourself, and you'll see the problem:

carl@chaffinch:~$ HEAD http://www.amazon.com
405 MethodNotAllowed
Date: Mon, 13 Aug 2012 18:50:56 GMT
Server: Server
Vary: Accept-Encoding,User-Agent
Allow: POST, GET
...

I can't see a way of solving this other than monkey-patching or otherwise extending URLValidator to use a GET or POST request; before doing so, you should think carefully about whether to use check_exists at all (without which this problem should go away). As core/validators.py itself says,

"The URLField verify_exists argument has intractable security and performance issues. Accordingly, it has been deprecated."

You'll find that the in-development version of Django has indeed disposed of this feature completely.

Ionization answered 13/8, 2012 at 18:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.