to extract the domain name from a url using `urllib):
from urllib.parse import urlparse
surl = "https://www.exam.org/index.html"
urlparsed = urlparse(surl)
# network location from parsed url
print(urlparsed.netloc)
# ParseResult Object
print(urlparsed)
this will give you www.exam.org
, but you want to further decompose this to registered domain if you are after just the exam.org
part. so besides doing simple splits, which could be sufficient, you could also use library such as tldextract
which knows how to parse subdmains, suffixes and more:
from tldextract import extract
ext = extract(surl)
print(ext.registered_domain)
this will produce:
exam.org
'.'.join(urlparse('https://www.exeam.org/index.html').netloc.split('.')[1:])
#44113835 – Blintzenot only the original URL of the Inquiry but also the majority
? I am sorry not to understand. – Blintze