Valid characters of a hostname?
Asked Answered
D

4

103

What are the valid characters of a hostname? This would be something like a networked computer or a web domain.

To put it in context, I am writing a PC game which connects to a remote server; so I have a field for hostname and a field for port. Obviously the port is a number in the Short range, but I need to know what all the possible hostname characters are (and any other pattern that might be required - does a hostname need to start with a letter?).

Examples of hostname include localhost or google.com.

Demagogue answered 19/8, 2010 at 14:54 Comment(1)
The linux sethostname command will accept any bytes.Write
A
110

Checkout this wiki, specifically the section Restrictions on valid host names

Hostnames are composed of series of labels concatenated with dots, as are all domain names. For example, "en.wikipedia.org" is a hostname. Each label must be between 1 and 63 characters long, and the entire hostname (including the delimiting dots but not a trailing dot) has a maximum of 253 ASCII characters.

The Internet standards (Requests for Comments) for protocols mandate that component hostname labels may contain only the ASCII letters 'a' through 'z' (in a case-insensitive manner), the digits '0' through '9', and the hyphen ('-'). The original specification of hostnames in RFC 952, mandated that labels could not start with a digit or with a hyphen, and must not end with a hyphen. However, a subsequent specification (RFC 1123) permitted hostname labels to start with digits. No other symbols, punctuation characters, or white space are permitted.

Axon answered 19/8, 2010 at 14:58 Comment(2)
Make sure you're clear about whether you're looking for a hostname or a fully-qualified domain name or either.Zantos
Also, be clear on whether youeare looking for strict DNS interoperability requirement or not. In particular, DNS prohibits underscores in hostname labels, whereas many operating systems permit them for local use. DNS servers tend to be lenient, and the DNS now in practice requires underscores in some record types, such as TXT records (for DKIM etc), just not for hostname labels.Adenosine
B
31

It depends on whether you process IDNs before or after the IDN toASCII algorithm (that is, do you see the domain name παράδειγμα.δοκιμή in Greek or as xn--hxajbheg2az3al.xn--jxalpdlp?).

In the latter case—where you are handling IDNs through the punycode—the old RFC 1123 rules apply:

U+0041 through U+005A (A-Z), U+0061 through U+007A (a-z) case folded as each other, U+0030 through U+0039 (0-9) and U+002D (-).

and U+002E (.) of course; the rules for labels allow the others, with dots between labels.

If you are seeing it in IDN form, the allowed characters are much varied, see http://unicode.org/reports/tr36/idn-chars.html for a handy chart of all valid characters.

Chances are your network code will deal with the punycode, but your display code (or even just passing strings to and from other layers) with the more human-readable form as nobody running a server on the السعودية. domain wants to see their server listed as being on .xn--mgberp4a5d4ar.

Bushwhacker answered 19/8, 2010 at 15:15 Comment(1)
The great thing about the way punycode works, is that below a certain level (the one where you do the networking) it's all ASCII. It's pretty much a way to pretend an ASCII only tech can work with other text (including some English words previously not allowed). As well as giving the whole word an upgrade path, it gives you one too: Build it to just do as RFC 1123 says, and if you later add in IDN support the network code stays the same and you add support for RFC 3490 on top (libraries are available in many languages to help too).Bushwhacker
M
3

A "name" (Net, Host, Gateway, or Domain name) is a text string up to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus sign (-), and period (.). Note that periods are only allowed when they serve to delimit components of "domain style names". (See RFC-921, "Domain Name System Implementation Schedule", for background). No blank or space characters are permitted as part of a name. No distinction is made between upper and lower case. The first character must be an alpha character. The last character must not be a minus sign or period. A host which serves as a GATEWAY should have "-GATEWAY" or "-GW" as part of its name. Hosts which do not serve as Internet gateways should not use "-GATEWAY" and "-GW" as part of their names. A host which is a TAC should have "-TAC" as the last part of its host name, if it is a DoD host. Single character names or nicknames are not allowed.

This is provided in http://support.microsoft.com/kb/149044

Myronmyrrh answered 31/7, 2014 at 9:28 Comment(3)
This is quoted verbatim from RFC 952 [tools.ietf.org/html/rfc952]. It is misleading - hostnames can be 1 to 63 characters long, as stated above.Marathon
The link is broken, unfortunately.Obscurity
that link should have been tools.ietf.org/html/rfc952Unreeve
T
0

If you're registering a domain and the termination (ex .com) it is not IDN, as Aaron Hathaway said: Hostnames are composed of series of labels concatenated with dots, as are all domain names. For example, en.wikipedia.org is a hostname. Each label must be between 1 and 63 characters long, and the entire hostname (including the delimiting dots but not a trailing dot) has a maximum of 253 ASCII characters.

The Internet standards (Requests for Comments) for protocols mandate that component hostname labels may contain only the ASCII letters a through z (in a case-insensitive manner), the digits 0 through 9, and the hyphen -. The original specification of hostnames in RFC 952, mandated that labels could not start with a digit or with a hyphen, and must not end with a hyphen. However, a subsequent specification (RFC 1123) permitted hostname labels to start with digits. No other symbols, punctuation characters, or white space are permitted.

Later, Spain with it's .es, .com.es, .org.es, .nom,es, .gob.es and .edu.es introduced IDN tlds, if your tld is one of .es or any other that supports it, any character can be used, but you can't combine alphabets like Latin, Greek or Cyril in one hostname, and that it respects the things that can't go at the start or at the end.

If you're using non-registered tlds, just for local networking, like with local DNS or with hosts files, you can treat them all as IDN.

Keep in mind some programs could not work well, especially old, outdated and unpopular ones.

Tan answered 2/6, 2020 at 2:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.