More than one top level domain?
Asked Answered
D

2

8

In a normal URL, you have a protocol, subdomains (optional), domain name, top level domain and subdirectories.

For example: http://www.google.com/path. Here www is subdomain, google is domain name and com is TLD; path is subdirectory here. Parsing this is simple programming task.

But the problem comes when there are more than one TLD's. For example: www.google.co.in/path. Here co.in is TLD. But I see that there is a website with name www.co.in also present.

My doubts are:

  • How many Top level domains can a URL have? In a URL how to find the top level domains, if there could be multiple TLDs?
  • In the above example google.co.in is not a subdomain of co.in, so how come www.co.in is resolving to a different website than google.co.in?
Dworman answered 27/6, 2014 at 8:57 Comment(3)
Actually, only the last part of the domain name is the TLD, always. Some countries do enforce secondary top-level types (like .co.uk), but the it's always the last part that is the TLD (.uk in my previous example).Pestiferous
What are the criteria for secondary TLD's . For ex: I want to parse "www.google.com" to google.com , but "code.google.co.uk" to "google.co.uk"? Is second level domain only allowed under country code?Dworman
The criteria are whatever the TLD registrar wants them to be, and may not even be fully consistent. For example in the UK, most businesses are under .co.uk, but parliament is www.parliament.uk (not .gov.uk, as a matter of constitutional principle), and parliament.uk works, so there isn't necessarily a www part. The best you'll do is a country-by-country heuristic, I think.Hippo
R
2

If I would have to write an algorithm that decides that "www.co.in" belongs to India Top Level Domain (TLD) and "www.google.co.in" belongs to India Second Level Domain (SLD), I would go here and grab the list:

https://wiki.mozilla.org/TLD_List

Then, I would process my URL like this:

  1. Compare the the last part of the URL to all TLDs in the list and find a matching one. [www.google.co.in -> in, www.co.in -> in]
  2. If no TLD was found, the URL is invalid.
  3. If a TLD was found and the URL has three parts or less, return the TLD as result and exit.
  4. If a TLD was found and the URL has more than three parts, do a second search in the list of SLDs. Compare the end of the URL against the pattern ".SLD.TLD".
  5. If no entry was found, return the TLD as result and exit.
  6. If an entry was found, return SLD.TLD as result and exit.
Rident answered 27/6, 2014 at 9:35 Comment(0)
U
0

Very slow yet comprehensive regex you could use: (sourced from Wikipedia and Mozilla)

[a-z0-9-]{1,63}(.ab.ca|.bc.ca|.mb.ca|.nb.ca|.nf.ca|.nl.ca|.ns.ca|.nt.ca|.nu.ca|.on.ca|.pe.ca|.qc.ca|.sk.ca|.yk.ca|.co.cc|.com.cd|.net.cd|.org.cd|.co.ck|.ac.cn|.com.cn|.edu.cn|.gov.cn|.net.cn|.org.cn|.ah.cn|.bj.cn|.cq.cn|.fj.cn|.gd.cn|.gs.cn|.gz.cn|.gx.cn|.ha.cn|.hb.cn|.he.cn|.hi.cn|.hl.cn|.hn.cn|.jl.cn|.js.cn|.jx.cn|.ln.cn|.nm.cn|.nx.cn|.qh.cn|.sc.cn|.sd.cn|.sh.cn|.sn.cn|.sx.cn|.tj.cn|.xj.cn|.xz.cn|.yn.cn|.zj.cn|.us.com|.com.cu|.edu.cu|.org.cu|.net.cu|.gov.cu|.inf.cu|.gov.cx|.com.dz|.org.dz|.net.dz|.gov.dz|.edu.dz|.asso.dz|.pol.dz|.art.dz|.com.ec|.info.ec|.net.ec|.fin.ec|.med.ec|.pro.ec|.org.ec|.edu.ec|.gov.ec|.mil.ec|.com.ee|.org.ee|.fie.ee|.pri.ee|.com.es|.nom.es|.org.es|.gob.es|.edu.es|.aland.fi|.tm.fr|.asso.fr|.nom.fr|.prd.fr|.presse.fr|.com.fr|.gouv.fr|.com.ge|.edu.ge|.gov.ge|.org.ge|.mil.ge|.net.ge|.pvt.ge|.co.gg|.net.gg|.org.gg|.com.gi|.ltd.gi|.gov.gi|.mod.gi|.edu.gi|.org.gi|.com.gp|.net.gp|.edu.gp|.asso.gp|.org.gp|.com.gr|.edu.gr|.net.gr|.org.gr|.gov.gr|.com.hk|.edu.hk|.gov.hk|.idv.hk|.net.hk|.org.hk|.com.hn|.edu.hn|.org.hn|.net.hn|.mil.hn|.gob.hn|.iz.hr|.from.hr|.name.hr|.com.hr|.com.ht|.net.ht|.firm.ht|.shop.ht|.info.ht|.pro.ht|.adult.ht|.org.ht|.art.ht|.pol.ht|.rel.ht|.asso.ht|.perso.ht|.coop.ht|.med.ht|.edu.ht|.gouv.ht|.gov.ie|.co.in|.firm.in|.net.in|.org.in|.gen.in|.ind.in|.nic.in|.ac.in|.edu.in|.res.in|.gov.in|.mil.in|.ac.ir|.co.ir|.gov.ir|.net.ir|.org.ir|.sch.ir|.co.je|.net.je|.org.je|.com.jo|.org.jo|.net.jo|.edu.jo|.gov.jo|.mil.jo|.co.kr|.or.kr|.edu.ky|.gov.ky|.com.ky|.org.ky|.net.ky|.gov.lk|.sch.lk|.net.lk|.int.lk|.com.lk|.org.lk|.edu.lk|.ngo.lk|.soc.lk|.web.lk|.ltd.lk|.assn.lk|.grp.lk|.hotel.lk|.gov.lt|.mil.lt|.gov.lu|.mil.lu|.org.lu|.net.lu|.com.lv|.edu.lv|.gov.lv|.org.lv|.mil.lv|.id.lv|.net.lv|.asn.lv|.conf.lv|.com.ly|.net.ly|.gov.ly|.plc.ly|.edu.ly|.sch.ly|.med.ly|.org.ly|.id.ly|.co.ma|.net.ma|.gov.ma|.org.ma|.tm.mc|.asso.mc|.org.mg|.nom.mg|.gov.mg|.prd.mg|.tm.mg|.com.mg|.edu.mg|.mil.mg|.com.mk|.org.mk|.com.mo|.net.mo|.org.mo|.edu.mo|.gov.mo|.org.mt|.com.mt|.gov.mt|.edu.mt|.net.mt|.com.mu|.co.mu|.gov.nr|.edu.nr|.biz.nr|.info.nr|.com.nr|.net.nr|.com.pf|.org.pf|.edu.pf|.com.ph|.gov.ph|.com.pk|.net.pk|.edu.pk|.org.pk|.fam.pk|.biz.pk|.web.pk|.gov.pk|.gob.pk|.gok.pk|.gon.pk|.gop.pk|.gos.pk|.com.pl|.biz.pl|.net.pl|.art.pl|.edu.pl|.org.pl|.ngo.pl|.gov.pl|.info.pl|.mil.pl|.waw.pl|.warszawa.pl|.wroc.pl|.wroclaw.pl|.krakow.pl|.poznan.pl|.lodz.pl|.gda.pl|.gdansk.pl|.slupsk.pl|.szczecin.pl|.lublin.pl|.bialystok.pl|.olsztyn.pl.torun.pl|.biz.pr|.com.pr|.edu.pr|.gov.pr|.info.pr|.isla.pr|.name.pr|.net.pr|.org.pr|.pro.pr|.edu.ps|.gov.ps|.sec.ps|.plo.ps|.com.ps|.org.ps|.net.ps|.com.pt|.edu.pt|.gov.pt|.int.pt|.net.pt|.nome.pt|.org.pt|.publ.pt|.com.ro|.org.ro|.tm.ro|.nt.ro|.nom.ro|.info.ro|.rec.ro|.arts.ro|.firm.ro|.store.ro|.www.ro|.com.ru|.net.ru|.org.ru|.pp.ru|.msk.ru|.int.ru|.ac.ru|.gov.rw|.net.rw|.edu.rw|.ac.rw|.com.rw|.co.rw|.int.rw|.mil.rw|.gouv.rw|.com.sc|.gov.sc|.net.sc|.org.sc|.edu.sc|.com.sd|.net.sd|.org.sd|.edu.sd|.med.sd|.tv.sd|.gov.sd|.info.sd|.org.se|.pp.se|.tm.se|.brand.se|.parti.se|.press.se|.komforb.se|.kommunalforbund.se|.komvux.se|.lanarb.se|.lanbib.se|.naturbruksgymn.se|.sshn.se|.fhv.se|.fhsk.se|.fh.se|.mil.se|.ab.se|.c.se|.d.se|.e.se|.f.se|.g.se|.h.se|.i.se|.k.se|.m.se|.n.se|.o.se|.s.se|.t.se|.u.se|.w.se|.x.se|.y.se|.z.se|.ac.se|.bd.se|.com.sg|.net.sg|.org.sg|.gov.sg|.edu.sg|.per.sg|.idn.sg|.ac.tj|.biz.tj|.com.tj|.co.tj|.edu.tj|.int.tj|.name.tj|.net.tj|.org.tj|.web.tj|.gov.tj|.go.tj|.mil.tj|.gov.to|.gov.tp|.co.tt|.com.tt|.org.tt|.net.tt|.biz.tt|.info.tt|.pro.tt|.name.tt|.edu.tt|.gov.tt|.gov.tv|.edu.tw|.gov.tw|.mil.tw|.com.tw|.net.tw|.org.tw|.idv.tw|.game.tw|.ebiz.tw|.club.tw|.com.ua|.gov.ua|.net.ua|.edu.ua|.org.ua|.cherkassy.ua|.ck.ua|.chernigov.ua|.cn.ua|.chernovtsy.ua|.cv.ua|.crimea.ua|.dnepropetrovsk.ua|.dp.ua|.donetsk.ua|.dn.ua|.ivano-frankivsk.ua|.if.ua|.kharkov.ua|.kh.ua|.kherson.ua|.ks.ua|.khmelnitskiy.ua|.km.ua|.kiev.ua|.kv.ua|.kirovograd.ua|.kr.ua|.lugansk.ua|.lg.ua|.lutsk.ua|.lviv.ua|.nikolaev.ua|.mk.ua|.odessa.ua|.od.ua|.poltava.ua|.pl.ua|.rovno.ua|.rv.ua|.sebastopol.ua|.sumy.ua|.ternopil.ua|.te.ua|.uzhgorod.ua|.vinnica.ua|.vn.ua|.zaporizhzhe.ua|.zp.ua|.zhitomir.ua|.zt.ua|.co.ug|.ac.ug|.sc.ug|.go.ug|.ne.ug|.or.ug|.ak.us|.al.us|.ar.us|.az.us|.ca.us|.co.us|.ct.us|.dc.us|.de.us|.dni.us|.fed.us|.fl.us|.ga.us|.hi.us|.ia.us|.id.us|.il.us|.in.us|.isa.us|.kids.us|.ks.us|.ky.us|.la.us|.ma.us|.md.us|.me.us|.mi.us|.mn.us|.mo.us|.ms.us|.mt.us|.nc.us|.nd.us|.ne.us|.nh.us|.nj.us|.nm.us|.nsn.us|.nv.us|.ny.us|.oh.us|.ok.us|.or.us|.pa.us|.ri.us|.sc.us|.sd.us|.tn.us|.tx.us|.ut.us|.vt.us|.va.us|.wa.us|.wi.us|.wv.us|.wy.us|.com.vi|.org.vi|.edu.vi|.gov.vi|.com.vn|.net.vn|.org.vn|.edu.vn|.gov.vn|.int.vn|.ac.vn|.biz.vn|.info.vn|.name.vn|.pro.vn|.health.vn|.com|.org|.net|.int|.edu|.gov|.mil|.arpa|.ac|.ad|.ae|.af|.ag|.ai|.al|.am|.an|.ao|.aq|.ar|.as|.at|.au|.aw|.ax|.az|.ba|.bb|.bd|.be|.bf|.bg|.bh|.bi|.bj|.bm|.bn|.bo|.br|.bs|.bt|.bw|.by|.bz|.ca|.cc|.cd|.cf|.cg|.ch|.ci|.ck|.cl|.cm|.cn|.co|.cr|.cu|.cv|.cw|.cx|.cy|.cz|.de|.dj|.dk|.dm|.do|.dz|.ec|.ee|.eg|.es|.et|.eu|.fi|.fj|.fk|.fm|.fo|.fr|.ga|.gd|.ge|.gf|.gg|.gh|.gi|.gl|.gm|.gn|.gp|.gq|.gr|.gs|.gt|.gu|.gw|.gy|.hk|.hm|.hn|.hr|.ht|.hu|.id|.ie|.il|.im|.in|.io|.iq|.ir|.is|.it|.je|.jm|.jo|.jp|.ke|.kg|.kh|.ki|.km|.kn|.kp|.kr|.kw|.ky|.kz|.la|.lb|.lc|.li|.lk|.lr|.ls|.lt|.lu|.lv|.ly|.ma|.mc|.md|.me|.mg|.mh|.mk|.ml|.mm|.mn|.mo|.mp|.mq|.mr|.ms|.mt|.mu|.mv|.mw|.mx|.my|.mz|.na|.nc|.ne|.nf|.ng|.ni|.nl|.no|.np|.nr|.nu|.nz|.om|.pa|.pe|.pf|.pg|.ph|.pk|.pl|.pm|.pn|.pr|.ps|.pt|.pw|.py|.qa|.re|.ro|.rs|.ru|.rw|.sa|.sb|.sc|.sd|.se|.sg|.sh|.si|.sk|.sl|.sm|.sn|.so|.sr|.ss|.st|.su|.sv|.sx|.sy|.sz|.tc|.td|.tf|.tg|.th|.tj|.tk|.tl|.tm|.tn|.to|.tr|.tt|.tv|.tw|.tz|.ua|.ug|.us|.uy|.uz|.va|.vc|.ve|.vg|.vi|.vn|.vu|.wf|.ws|.ye|.yt|.za|.zm|.zw|.dz|.am|.bh|.bd|.by|.bg|.cn|.cn|.eg|.eu|.ge|.gr|.hk|.in|.in|.in|.in|.in|.in|.in|.in|.in|.in|.in|.in|.in|.in|.in|.ir|.iq|.jo|.kz|.mo|.mo|.my|.mr|.mn|.ma|.mk|.om|.pk|.ps|.qa|.ru|.sa|.rs|.sg|.sg|.kr|.lk|.lk|.sd|.sy|.tw|.tw|.th|.tn|.ua|.ae|.ye|.academy|.accountant|.adult|.aero|.africa|.agency|.apartments|.app|.archi|.associates|.audio|.auto|.bar|.bargains|.bible|.bike|.biz|.black|.blackfriday|.blog|.blue|.builders|.cam|.cam|.camera|.camp|.cancerresearch|.car|.cards|.cars|.center|.cheap|.christmas|.church|.click|.clothing|.cloud|.club|.codes|.coffee|.college|.coop|.country|.dance|.date|.dating|.design|.dev|.diet|.directory|.download|.eco|.education|.email|.events|.exchange|.exposed|.faith|.farm|.flowers|.game|.gdn|.gift|.glass|.global|.gop|.green|.guitars|.guru|.help|.hiphop|.hiv|.holdings|.hosting|.house|.info|.ink|.international|.jobs|.kim|.land|.lgbt|.life|.lighting|.link|.live|.loan|.lol|.love|.map|.market|.med|.meet|.menu|.mobi|.moe|.mom|.movie|.museum|.music|.name|.new|.NGO_and_.ONG|.org_(top-level_domain)|.one|.one|.onl|.ooo|.organic|.pharmacy|.photo|.photos|.pics|.pink|.pizza|.plumbing|.porn|.post|.pro|.properties|.property|.realtor|.rich|.rocks|.sale|.science|.sex|.sexy|.shop|.singles|.social|.solar|.stream|.sucks|.support|.tattoo|.tel|.today|.top|.travel|.ventures|.video|.voting|.wedding|.wiki|.win|.work|.wtf|.xxx|.XYZ|.kaufen|.desi|.shiksha|.moda|.futbol|.juegos|.uno|.africa|.asia|.krd|.taipei|.tokyo|.alsace|.amsterdam|.bcn|.barcelona|.berlin|.brussels|.bzh|.cat|.cymru|.eus|.frl|.gal|.gent|.irish|.istanbul|.istanbul|.london|.paris|.saarland|.scot|.swiss|.wales|.wien|.miami|.nyc|.quebec|.vegas|.kiwi|.melbourne|.sydney|.lat|.rio|.ru|.aaa|.abb|.aeg|.afl|.aig|.airtel|.bbc|.bentley|.example|.invalid|.local|.localhost|.onion|.testa)$
Unpleasantness answered 29/5, 2019 at 13:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.