TL;DR: You can't actually. Every answer given already misses 1 or more cases.
- String is google.com (invalid since no scheme, even though a browser assumes by default http). Urlparse will be missing scheme and netloc. So
all([result.scheme, result.netloc, result.path])
seems to work for this case
- String is http://google (invalid since .com is missing). Urlparse will be missing only path. Again
all([result.scheme, result.netloc, result.path])
seems to catch this case
- String is http://google.com/ (correct). Urlparse will populate scheme, netloc and path. So for this case
all([result.scheme, result.netloc, result.path])
works fine
- String is http://google.com (correct). Urlparse will be missing only path. So for this case
all([result.scheme, result.netloc, result.path])
seems to give a false negative
So from the above cases you see that the one that comes closest to a solution is all([result.scheme, result.netloc, result.path])
. But this works only in cases where the url contains a path (even if that is the / path).
Even if you try to enforce a path (i.e urlparse(urljoin(your_url, "/"))
you will still get a false positive in case 2
Maybe something more complicated like
final_url = urlparse(urljoin(your_url, "/"))
is_correct = (all([final_url.scheme, final_url.netloc, final_url.path])
and len(final_url.netloc.split(".")) > 1)
Maybe you also want to skip scheme checking and assume http if no scheme.
But even this will get you up to a point. Although it covers the above cases, it doesn't fully cover cases where a url contains an ip instead of a hostname. For such cases you will have to validate that the ip is a correct ip. And there are more scenarios as well. See https://en.wikipedia.org/wiki/URL to think even more cases
if(url[:7] != 'http://'):
...url = 'http://' + url
– Volney