In a normal URL, you have a protocol, subdomains (optional), domain name, top level domain and subdirectories.
For example: http://www.google.com/path
. Here www
is subdomain, google
is domain name and com
is TLD; path
is subdirectory here. Parsing this is simple programming task.
But the problem comes when there are more than one TLD's. For example: www.google.co.in/path
. Here co.in
is TLD. But I see that there is a website with name www.co.in
also present.
My doubts are:
- How many Top level domains can a URL have? In a URL how to find the top level domains, if there could be multiple TLDs?
- In the above example
google.co.in
is not a subdomain ofco.in
, so how comewww.co.in
is resolving to a different website thangoogle.co.in
?
.co.uk
, but parliament iswww.parliament.uk
(not.gov.uk
, as a matter of constitutional principle), and parliament.uk works, so there isn't necessarily awww
part. The best you'll do is a country-by-country heuristic, I think. – Hippo