I'm working on an email validation regex in PHP and I need to know how long the TLD could possibly be and still be valid. I did a few searches but couldn't find much information on the topic. So how long can a TLD possibly be?
DNS allows for a maximum of 63 characters for an individual label.
The longest TLD currently in existence is 24 characters long, and subject to change. The maximum TLD length specified by RFC 1034 is 63 octets.
To get the length of the longest existing TLD:
wget -qO - http://data.iana.org/TLD/tlds-alpha-by-domain.txt | tail -n+2 | wc -L
Here's what that command does:
- Get the latest list of actual existing TLDs from IANA
- Strip the first line, which is a long-ish comment
- Launch
wc
to count the longest line
Alternative using curl
thanks to Stefan:
curl -s http://data.iana.org/TLD/tlds-alpha-by-domain.txt | tail -n+2 | wc -L
curl -s http://data.iana.org/TLD/tlds-alpha-by-domain.txt | tail -n+2 | wc -L
–
Ferric wc -L
is also not portable, though it's easy to write an Awk script which subsumes the functionality of tail -n+2 | wc -L
; e.g. awk 'NR>1{ if (length($0) < max) next; longest=$0; max=length($0)} END { print max, longest }'
which as an added bonus also prints the longest TLD (or just the first one, if there are several of the same length). Still at 24, with XN--VERMGENSBERATUNG-PWB
, aka .vermögensberatung
. A quick check which is much easier to type is awk 'NR>1 && length($0)>23'
–
Idle -EDIT-
According to RFC 2606 .localhost is reserved domain name and its length is 9 characters. That is the longest I am aware of.
-END OF EDIT-
However, I think that you should care about email address length and not only TLD length. Below is a quote from this article. The email address length is 254 characters:
There appears to be some confusion over the maximum valid email address size. Most people believe it to be 320 characters (64 characters for the username + 255 characters for the domain + 1 character for the @ symbol). Other sources suggest 129 (64 + 1 + 64) or 384 (128+1+255, assuming the username doubles in length in the future).
This confusion means you should heed the 'robustness principle' ("developers should carefully write software that adheres closely to extant RFCs but accept and parse input from peers that might not be consistent with those RFCs." - Wikipedia) when writing software that deals with email addresses. Furthermore, some software may be crippled by naive assumptions, e.g. thinking that 50 characters is adequate (examples). Your 200 character email address may be technically valid but that will not help you if most websites or applications reject it.
The actual maximum email length is currently 254 characters:
"The original version of RFC 3696 did indeed say 320 was the maximum length, but John Klensin (ICANN) subsequently accepted this was wrong."
"This arises from the simple arithmetic of maximum length of a domain (255 characters) + maximum length of a mailbox (64 characters) + the @ symbol = 320 characters. Wrong. This canard is actually documented in the original version of RFC3696. It was corrected in the errata. There's actually a restriction from RFC5321 on the path element of an SMTP transaction of 256 characters. But this includes angled brackets around the email address, so the maximum length of an email address is 254 characters."
[email protected]
. –
Grandiloquence .localhost
and friends should probably not show up in a working email address. I assume that's what the validation is for. –
Purdy The longest with latin letters is .MUSEUM (source), but there are some with special characters. The longest from them is XN--CLCHC0EA0B2G2A9GCD. Also, in a short time, it will be possible to reserve your own TLD for a high price and so it will be possible to be longer.
Since I'm a .net developer following is the java-script representation of determining the longest TLD currently available.this will return the length of the longest TLD which you would be able to use in your RegEx.
please try the following Code Snippet
function getTLD() {
var length = 0;
var longest;
var request = new XMLHttpRequest();
request.open('GET', 'http://data.iana.org/TLD/tlds-alpha-by-domain.txt', true);
request.send(null);
request.onreadystatechange = function () {
if (request.readyState === 4 && request.status === 200) {
var type = request.getResponseHeader('Content-Type');
if (type.indexOf("text") !== 1) {
var tldArr = request.responseText.split('\n');
tldArr.splice(0, 1);
for (var i = 0; i < tldArr.length; i++) {
if (tldArr[i].length > length) {
length = tldArr[i].length;
longest = tldArr[i];
}
}
console.log("Longest >> " + longest + " >> " + length);
return length;
}
}
}
}
<button onclick="getTLD()">Get TLD</button>
Longest TLD up to date is .xn--vermgensberatung-pwb, at 24 characters in Punycode and 17 when decoded [vermögensberatung]. With no Punycode it would be .northwesternmutual or .travelersinsurance both at 18 characters.
However, a domain name, the thing that goes before an TLD, can be up to 63 characters long, as seen here: http://www.thelongestdomainnameintheworldandthensomeandthensomemoreandmore.com
This is PHP
code to get up-to-date vertical bar separated UTF-8
TLDs list to be used directly in a regular expression:
<?php
function getTLDs($separator){
$tlds=file('http://data.iana.org/TLD/tlds-alpha-by-domain.txt');
array_shift($tlds); // remove heading comment
usort($tlds,function($a,$b){ return strlen($b)-strlen($a); }); // sort from longest to shortest
return implode($separator,array_map(function($e){ return idn_to_utf8(trim(strtolower($e))); },$tlds));
}
echo getTLDs('|');
?>
To match a host name you could use it like this:
$tlds=getTLDs('|');
if (preg_match("{([\da-z\.-]+)\.($tlds)}u",$address)) {
..
}
A TLD can be any length at all. New TLDs happen all the time. In the future there will be more TLDs not regulated by the entity currently regulating the majority of TLDs. We also won't use email in the future as we presently do. That said:
You don't need to validate an email address ever. If you want to slow people down and have an idea as to whether they're actually human, include a CAPTCHA. If you need to confirm working email, send an email with a validation link they can open. If you aren't throttling submissions of things that can generate things like emails being sent for verification, it won't matter whether you're confirming the address is technically valid anyway, it will be abused at that point regardless.
© 2022 - 2024 — McMap. All rights reserved.
is_email
. Remember, the only way to truly know if an email address is valid is to send a mail to it and require the user to take an action based on the mail's contents. – Quarry