How long can a TLD possibly be?
Asked Answered
G

8

90

I'm working on an email validation regex in PHP and I need to know how long the TLD could possibly be and still be valid. I did a few searches but couldn't find much information on the topic. So how long can a TLD possibly be?

Grandiloquence answered 11/2, 2012 at 7:37 Comment(11)
You should read "How to use a regular expression to validate an email addresses?".Seppala
As noted in many related questions, email validation by regex is dubious at best. See also e.g. stackoverflow.com/questions/201323/…Idle
Please, please, please don't roll your own solution to this well-solved problem. Use PHP's built-in validation or an excellent, well-tested, RFC-compliant third party library, like is_email. Remember, the only way to truly know if an email address is valid is to send a mail to it and require the user to take an action based on the mail's contents.Quarry
Also, don't forget that TLDs change with regularity, and with gTLDs coming into existence soon, anything that validates against known TLDs is going to become obsolete and difficult to maintain.Quarry
@Quarry Where does it say he's using PHP? What other ways are there to validate without "rolling your own solution" given that most solutions can't use PHP (email address validation is almost always client-side).Kiesha
@NickG, this question used to have the PHP tag. I'm not sure why it was edited out, and I have reverted that edit and updated the context of the question to make it more clear how PHP matters to the question.Quarry
@Quarry I think PHP is irrelevant here (hence why it was likely edited out). The max length of a TLD is totally unrelated to which platform you're using.Kiesha
@NickG, this question is an X/Y problem. The user is trying to validate email addresses. The fact that he's asking about domain name length is the wrong question to ask, as there are better ways for this user to actually do the thing they're actually trying to do - validate email addresses.Quarry
@charles No. The title of the question is How long can a TLD possibly be? the answer is 63. See accepted answer if you don't understand these comments.Kiesha
@charles ...and no - obviously TLD length is exactly the same whether you're validating an email address or a web address. And I don't see that you've posted a higher upvoted answer.Kiesha
@Kiesha Did you read the meta question I linked? If you did not, please go read it and then the entire first answer of it. Now, read the question title, then read the very first sentence in this question. Now read the original comment that you were confused by. The user is trying to validate email addresses, using a regex, using PHP. Now, note alllll the timestamps. Mountain. Molehill. I'm done.Quarry
I
89

DNS allows for a maximum of 63 characters for an individual label.

Idle answered 11/2, 2012 at 9:50 Comment(2)
is it 64? i see 63 here: en.wikipedia.org/wiki/Domain_Name_System#cite_ref-rfc1034_1-2 have i overlooked one?Scapula
Thanks for asking. I don't remember doing extensive research on this one, so I probably just googled it at the time. I will edit the question and leave this thank-you note.Idle
B
81

The longest TLD currently in existence is 24 characters long, and subject to change. The maximum TLD length specified by RFC 1034 is 63 octets.

To get the length of the longest existing TLD:

wget -qO - http://data.iana.org/TLD/tlds-alpha-by-domain.txt | tail -n+2 | wc -L

Here's what that command does:

  1. Get the latest list of actual existing TLDs from IANA
  2. Strip the first line, which is a long-ish comment
  3. Launch wc to count the longest line

Alternative using curl thanks to Stefan:

curl -s http://data.iana.org/TLD/tlds-alpha-by-domain.txt | tail -n+2 | wc -L
Brindled answered 26/2, 2014 at 10:28 Comment(3)
it's 24 now :) if you don't have wget here is the curl alternative curl -s http://data.iana.org/TLD/tlds-alpha-by-domain.txt | tail -n+2 | wc -LFerric
This is not the answer for the question "how long it can possibly be" - it's only an answer to what is the longest one currently in use.Kiesha
wc -L is also not portable, though it's easy to write an Awk script which subsumes the functionality of tail -n+2 | wc -L; e.g. awk 'NR>1{ if (length($0) < max) next; longest=$0; max=length($0)} END { print max, longest }' which as an added bonus also prints the longest TLD (or just the first one, if there are several of the same length). Still at 24, with XN--VERMGENSBERATUNG-PWB, aka .vermögensberatung. A quick check which is much easier to type is awk 'NR>1 && length($0)>23'Idle
F
9

-EDIT-

According to RFC 2606 .localhost is reserved domain name and its length is 9 characters. That is the longest I am aware of.

-END OF EDIT-

However, I think that you should care about email address length and not only TLD length. Below is a quote from this article. The email address length is 254 characters:

There appears to be some confusion over the maximum valid email address size. Most people believe it to be 320 characters (64 characters for the username + 255 characters for the domain + 1 character for the @ symbol). Other sources suggest 129 (64 + 1 + 64) or 384 (128+1+255, assuming the username doubles in length in the future).

This confusion means you should heed the 'robustness principle' ("developers should carefully write software that adheres closely to extant RFCs but accept and parse input from peers that might not be consistent with those RFCs." - Wikipedia) when writing software that deals with email addresses. Furthermore, some software may be crippled by naive assumptions, e.g. thinking that 50 characters is adequate (examples). Your 200 character email address may be technically valid but that will not help you if most websites or applications reject it.

The actual maximum email length is currently 254 characters:

"The original version of RFC 3696 did indeed say 320 was the maximum length, but John Klensin (ICANN) subsequently accepted this was wrong."

"This arises from the simple arithmetic of maximum length of a domain (255 characters) + maximum length of a mailbox (64 characters) + the @ symbol = 320 characters. Wrong. This canard is actually documented in the original version of RFC3696. It was corrected in the errata. There's actually a restriction from RFC5321 on the path element of an SMTP transaction of 256 characters. But this includes angled brackets around the email address, so the maximum length of an email address is 254 characters."

Frug answered 11/2, 2012 at 7:47 Comment(4)
That's a good point and one I had already taken into account (I actually read this article earlier). However, I do believe it is beneficial to check the TLD length to ensure someone doesn't just type [email protected].Grandiloquence
According to RFC 2606 .localhost is reserved domain name and its length is 9 charactersFrug
@aviad: But .localhost and friends should probably not show up in a working email address. I assume that's what the validation is for.Purdy
@Tikhon Jelvis, agreed, however, this is still valid.Frug
F
7

The longest with latin letters is .MUSEUM (source), but there are some with special characters. The longest from them is XN--CLCHC0EA0B2G2A9GCD. Also, in a short time, it will be possible to reserve your own TLD for a high price and so it will be possible to be longer.

Florous answered 11/2, 2012 at 8:11 Comment(1)
This is no longer true, there are now some brand tlds and generic tlds that are longerMarcelo
B
2

Since I'm a .net developer following is the java-script representation of determining the longest TLD currently available.this will return the length of the longest TLD which you would be able to use in your RegEx.

please try the following Code Snippet

function getTLD() {
    var length = 0;
    var longest;
    var request = new XMLHttpRequest();

    request.open('GET', 'http://data.iana.org/TLD/tlds-alpha-by-domain.txt', true);
    request.send(null);
    request.onreadystatechange = function () {
        if (request.readyState === 4 && request.status === 200) {
            var type = request.getResponseHeader('Content-Type');
            if (type.indexOf("text") !== 1) {
                var tldArr = request.responseText.split('\n'); 
                tldArr.splice(0, 1);

                for (var i = 0; i < tldArr.length; i++) {
                    if (tldArr[i].length > length) {
                        length = tldArr[i].length;
                        longest = tldArr[i];
                    }
                } 

                console.log("Longest >> " + longest + " >> " + length);
                return length;
            }
        }
    }
}
<button onclick="getTLD()">Get TLD</button>
Bijou answered 18/8, 2019 at 9:47 Comment(0)
P
1

Longest TLD up to date is .xn--vermgensberatung-pwb, at 24 characters in Punycode and 17 when decoded [vermögensberatung]. With no Punycode it would be .northwesternmutual or .travelersinsurance both at 18 characters.

However, a domain name, the thing that goes before an TLD, can be up to 63 characters long, as seen here: http://www.thelongestdomainnameintheworldandthensomeandthensomemoreandmore.com

Prader answered 15/10, 2021 at 16:53 Comment(0)
B
0

This is PHP code to get up-to-date vertical bar separated UTF-8 TLDs list to be used directly in a regular expression:

<?php 
  function getTLDs($separator){
    $tlds=file('http://data.iana.org/TLD/tlds-alpha-by-domain.txt');
    array_shift($tlds); // remove heading comment
    usort($tlds,function($a,$b){ return strlen($b)-strlen($a); }); // sort from longest to shortest
    return implode($separator,array_map(function($e){ return idn_to_utf8(trim(strtolower($e))); },$tlds));
  }
  echo getTLDs('|');
?>

To match a host name you could use it like this:

$tlds=getTLDs('|');
if (preg_match("{([\da-z\.-]+)\.($tlds)}u",$address)) {
  ..
}
Bevvy answered 11/2, 2012 at 7:37 Comment(0)
A
0

A TLD can be any length at all. New TLDs happen all the time. In the future there will be more TLDs not regulated by the entity currently regulating the majority of TLDs. We also won't use email in the future as we presently do. That said:

You don't need to validate an email address ever. If you want to slow people down and have an idea as to whether they're actually human, include a CAPTCHA. If you need to confirm working email, send an email with a validation link they can open. If you aren't throttling submissions of things that can generate things like emails being sent for verification, it won't matter whether you're confirming the address is technically valid anyway, it will be abused at that point regardless.

Abert answered 11/2, 2020 at 1:8 Comment(2)
Following this argument we should not do any validation of user input because it might change at some point in the future. This is certainly not good advice as validation has several uses. Think about it, if we make a script that sends an email to a supplied email address and we do not do any validation, an attacker could simply submit the form with bullshit data over and over again and our email server would try and fail at sending those emails. Do enough of those and the email server goes down. A CAPTCHA does help but it's not fail-safe, nor is it free above a certain amount of requests.Javelin
@Javelin No, & I already addressed that in my answer. "it will be abused at that point regardless." An attacker can submit a form with data that is not "bullshit" over & over just as much as data that is "bullshit". Moreover real data will cause more real world problems, unlike fake data. They're separate concerns, & you can only stop abuse by checking for abuse. The data's largely irrelevant. It's only useful in this regard if you want to do a retrospective & decide that multiple users submitting the same data are the same abuser, which again doesn't require any particular data or "validity".Abert

© 2022 - 2024 — McMap. All rights reserved.