How can I check if a URL exists with Django’s validators?
Asked Answered
T

8

23

I want to check in django if a URL exists and if it does I want to show something on screen, i.e.:

if URL_THAT_POINTS_TO_SOME_PDF exists 
     SHOW_SOMETHING
Torr answered 3/7, 2010 at 3:41 Comment(0)
N
55

Edit: Please note, this is no longer valid for any version of Django above 1.5

I assume you want to check if the file actually exists, not if there is just an object (which is just a simple if statement)

First, I will recommend always looking through Django's source code because you will find some great code that you could use :)

I assume you want to do this within a template. There is no built-in template tag to validate a URL but you could essentially use that URLValidator class within a template tag to test it. Simply:

from django.core.validators import URLValidator
from django.core.exceptions import ValidationError

validate = URLValidator(verify_exists=True)
try:
    validate('http://www.somelink.com/to/my.pdf')
except ValidationError as e:
    print(e)

The URLValidator class will spit out the ValidationError when it can't open the link. It uses urllib2 to actually open the request so it's not just using basic regex checking (But it also does that.)

You can plop this into a custom template tag, which you will find out how to create in the django docs and off you go.

Hope that is a start for you.

Naresh answered 3/7, 2010 at 4:0 Comment(3)
verify_exists is deprecated for security reasons and has been removed in Django 1.5Tony
since django 1.5, verify_exists argument has been deprecated/removed, you can no longer verify for a url existence, its now just a simple regex match for a valid urlAnalysis
depricated warning link in internet archiveMatildematin
S
11

Problem

from django.core.validators import URLValidator says that www.google.ro is invalid. Which is wrong in my point of view. Or at least not enough.

How to solve it?

The clue Is to look at the source code for models.URLField, you will see that it uses forms.FormField as a validator. Which does more than URLValidator from above

Solution

If I want to validate a url like http://www.google.com or like www.google.ro, I would do the following:

from django.forms import URLField

def validate_url(url):
    url_form_field = URLField()
    try:
        url = url_form_field.clean(url)
    except ValidationError:
        return False
    return True

I found this useful. Maybe it helps someone else.

Scorekeeper answered 19/1, 2017 at 16:24 Comment(0)
P
6

Anything based on the verify_exists parameter to django.core.validators.URLValidator will stop working with Django 1.5 — the documentation helpfully says nothing about this, but the source code reveals that using that mechanism in 1.4 (the latest stable version) leads to a DeprecationWarning (you'll see it has been removed completely in the development version):

if self.verify_exists:
    import warnings
    warnings.warn(
        "The URLField verify_exists argument has intractable security "
        "and performance issues. Accordingly, it has been deprecated.",
        DeprecationWarning
        )

There are also some odd quirks with this method related to the fact that it uses a HEAD request to check URLs — bandwidth-efficient, sure, but some sites (like Amazon) respond with an error (to HEAD, where the equivalent GET would have been fine), and this leads to false negative results from the validator.

I would also (a lot has changed in two years) recommend against doing anything with urllib2 in a template — this is completely the wrong part of the request/response cycle to be triggering potentially long-running operations: consider what happens if the URL does exist, but a DNS problem causes urllib2 to take 10 seconds to work that out. BAM! Instant 10 extra seconds on your page load.

I would say the current best practice for making possibly-long-running tasks like this asynchronous (and thus not blocking page load) is using django-celery; there's a basic tutorial which covers using pycurl to check a website, or you could look into how Simon Willison implemented celery tasks (slides 32-41) for a similar purpose on Lanyrd.

Perfumery answered 21/8, 2012 at 18:5 Comment(0)
V
2

It took an additional:

from django.core.exceptions import ValidationError

for it to work for me. Just saying ;0)

Variometer answered 29/7, 2010 at 20:24 Comment(0)
P
2
from django.core.validators import URLValidator
from django.core.exceptions import ValidationError

validate = URLValidator(verify_exists=True)    
value = request.GET.get('url', None)

if value:        
    try:
        validate(value)
    except ValidationError, e:
        print e

validate(value) fails if the url is not preceeded with a schema like http://. I wonder if that is by design.

Posh answered 17/5, 2012 at 5:37 Comment(0)
W
2

I have not seen the answer here. It might helpful to someone else.

from django import forms
f = forms.URLField()
try:
    f.clean(http://example.com)
    print "valid url"
except:
    print "invalid url"
Wernick answered 23/11, 2016 at 7:17 Comment(2)
I was going to put a similar asnwer. You had a good try, but it is not enough.Scorekeeper
@MihaiZamfir: why is this not enough?Faxun
S
0

See: http://www.agmweb.ca/2009-04-19-django-urlpatterns---its-more-than-just-urls/

In django 1.10 i now use:

from django.core.urlresolvers import RegexURLResolver, Resolver404

if 'next' in request.GET.keys():
    n = request.GET["next"].strip('/') + "/"
    resolver = RegexURLResolver(r'', urls)
    try:
        callback, callback_args, callback_kwargs = resolver.resolve(n)
        return HttpResponseRedirect(str(request.GET["next"]))
    except Resolver404:
        raise PermissionDenied("This page is not available")
Shishko answered 3/2, 2017 at 8:27 Comment(0)
P
0

For new version of Django, you should just when a non valid url is given it raises a Validation exception django.core.exceptionValidationError

from django.core.validators import URLValidator                                                                                                                             
validate = URLValidator()                                                                                                                                                                  
validate('http://www.google.com')                                                                                                                                                                
validate('http://www2.google.com')                                                                                                                                                               
validate('httpAA://www2.google.com')                                                                                                                                                             
     Traceback (most recent call last):                                                                                                                                                                     
     File "<console>", line 1, in <module>                                                                                                                                                                
     File "/etl/.venv/lib/python3.8/site- 
     packages/django/core/validators.py", line 110, in __call__                                                                                                         
     raise ValidationError(self.message, code=self.code, params={'value': value})                                                                                                                     
     django.core.exceptions.ValidationError: ['Enter a valid URL.']
Pforzheim answered 13/9, 2022 at 3:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.