The original question is a bit old, but you might also want to look at the Validator-Collection library I released a few months back. It includes high-performing regex-based validation of URLs for compliance against the RFC standard. Some details:
- Tested against Python 2.7, 3.4, 3.5, 3.6, 3.7, and 3.8
- No dependencies on Python 3.x, one conditional dependency in Python 2.x (drop-in replacement for Python 2.x's buggy
re
module)
- Unit tests that cover 100+ different succeeding/failing URL patterns, including non-standard characters and the like. As close to covering the whole spectrum of the RFC standard as I've been able to find.
It's also very easy to use:
from validator_collection import validators, checkers
checkers.is_url('http://www.stackoverflow.com')
# Returns True
checkers.is_url('not a valid url')
# Returns False
value = validators.url('http://www.stackoverflow.com')
# value set to 'http://www.stackoverflow.com'
value = validators.url('not a valid url')
# raises a validator_collection.errors.InvalidURLError (which is a ValueError)
value = validators.url('https://123.12.34.56:1234')
# value set to 'https://123.12.34.56:1234'
value = validators.url('http://10.0.0.1')
# raises a validator_collection.errors.InvalidURLError (which is a ValueError)
value = validators.url('http://10.0.0.1', allow_special_ips = True)
# value set to 'http://10.0.0.1'
In addition, Validator-Collection includes about 60+ other validators, including IP addresses (IPv4 and IPv6), domains, and email addresses as well, so something folks might find useful.