Regular Expression for validating DNS label ( host name)
Asked Answered
C

7

24

I would like to validate a hostname using only regualr expression.

Host Names (or 'labels' in DNS jargon) were traditionally defined by RFC 952 and RFC 1123 and may be composed of the following valid characters.

List item

  • A to Z ; upper case characters
  • a to z ; lower case characters
  • 0 to 9 ; numeric characters 0 to 9
  • - ; dash

The rules say:

  • A host name (label) can start or end with a letter or a number
  • A host name (label) MUST NOT start or end with a '-' (dash)
  • A host name (label) MUST NOT consist of all numeric values
  • A host name (label) can be up to 63 characters

How would you write Regular Expression to validate hostname ?

Carryingon answered 14/1, 2010 at 9:31 Comment(8)
No mininum characters limitations?Nationwide
Nope. Blank DNS label in BIND means "same as above"Carryingon
Your question is wrongly phrased: a host name has nothing to do with a DNS label for two reasons: a host name can be a Fully Qualified Domain Name and the syntax for host names is much more restrictive than the syntax for domain names.Gyroscope
Yes Sheldon, I'm partially agree with you. For most people host name is the part before the domain. eg: www.pedantic.com .. www=host name pedantic.com=domain. Not many people heard of DNS label. I just wanted to make it easily searched.Carryingon
Who is Sheldon? (Not every SO reader watches the stupid US TV serials.)Gyroscope
@Carryingon a blank label only means that in a zone file.Sapling
Label vs. hostname makes a difference here as mentioned by @solidsnack below. A label is allowed to be only numeric values. For example, 1234.com is legal even though "1234" is only numeric values. However, a full hostname may not be only numeric values because then it is an IP address.Crampon
RFC 1035 gives the format as <label> ::= <letter> [ [ <ldh-str> ] <let-dig> ] which means that zero-length labels are not allowedOttinger
B
22
^(?![0-9]+$)(?!-)[a-zA-Z0-9-]{,63}(?<!-)$

I used the following testbed written in Python to verify that it works correctly:

tests = [
    ('01010', False),
    ('abc', True),
    ('A0c', True),
    ('A0c-', False),
    ('-A0c', False),
    ('A-0c', True),
    ('o123456701234567012345670123456701234567012345670123456701234567', False),
    ('o12345670123456701234567012345670123456701234567012345670123456', True),
    ('', True),
    ('a', True),
    ('0--0', True),
]

import re
regex = re.compile('^(?![0-9]+$)(?!-)[a-zA-Z0-9-]{,63}(?<!-)$')
for (s, expected) in tests:
    is_match = regex.match(s) is not None
    print is_match == expected
Busterbustle answered 14/1, 2010 at 9:42 Comment(4)
Use \A and \z in place of ^ and $, respectively, in Ruby since Ruby regular expressions are multi-line by default: \A(?![0-9]+$)(?!-)[a-zA-Z0-9-]{,63}(?<!-)\z.Crampon
01010 is a valid label (RFC 1123). The empty string is an invalid label (RFC 1035)Ottinger
Read my answer below which requires the addition of underscores to your regex.Hungerford
@CoreyBallou No, underscores are not allowed in hostnames. They are only allowed in domain names, so it all depends on the resource record. _whatever CNAME elsewhere is valid (because owner of a CNAME is a domain name not an hostname) but _whatever IN A 192.0.2.42 is not valid because owner of an A record is an hostname and not a domain name.Gaddy
J
15

Javascript regex based on Marks answer:

pattern = /^(?![0-9]+$)(?!.*-$)(?!-)[a-zA-Z0-9-]{1,63}$/g;
Jackjackadandy answered 7/9, 2012 at 3:16 Comment(0)
D
7

The k8s API responds with the regex that it uses to validate e.g. an RFC 1123-compliant string:

(⎈ minikube:default)➜  cloud-app git:(mc/72-org-ns-names) ✗ k create ns not-valid1234234$%
The Namespace "not-valid1234234$%" is invalid: metadata.name: 
Invalid value: "not-valid1234234$%": a lowercase RFC 1123 label must consist of lower case 
alphanumeric characters or '-', and must start and end with an alphanumeric character 
(e.g. 'my-name',  or '123-abc', regex used for validation is
 '[a-z0-9]([-a-z0-9]*[a-z0-9])?')
Dissonance answered 4/5, 2021 at 15:44 Comment(0)
N
5

It is worth noting that DNS labels and hostname components have slightly different rules. Most notably: '_' is not legal in any component of a hostname, but is a standard part of labels used for things like SRV records.

A more readable and portable approach is to require a string to match both of these POSIX ERE's:

^([[:alnum:]][[:alnum:]\-]{0,61}[[:alnum:]]|[[:alpha:]])$
^.*[[:^digit:]].*$

Those should be easy to use in any standard-compatible ERE implementation. Perl-style backtracking as in the Python example is widely available, but has the problem of not being exactly the same everywhere that it seems to work. Ouch.

It is possible in principle to make a single ERE of those two lines, but it would be long and unwieldy. The first line handles all of the rules other than the ban on all-digits, the second kills those.

Nicholson answered 20/3, 2012 at 5:13 Comment(2)
I find your first regex matching more than I want. May I suggest following improvement? ^[[:alnum:]][[:alnum:]\-]{0,61}[[:alnum:]]$|^[[:alnum:]]$Defeasance
Thanks! I have made a slightly different improvement that makes the meaning more human-obvious. Note that I've left the second alternative pattern as 'alpha' instead of 'alnum' because all-digit labels are not legal.Nicholson
C
3

Ruby regular expressions are multiline by default, and so something like Rails warns against using ^ and $. This is Mark's answer with safe start- and end of string characters:

\A(?![0-9]+$)(?!-)[a-zA-Z0-9-]{,63}(?<!-)\z
Crampon answered 6/9, 2013 at 19:1 Comment(1)
It is actually okay for a label (part of a domain name) to be all numeric. However, for the whole domain name to be all numeric is in practice disallowed, since TLDs are not all numeric, and it is expected that one can distinguish syntactically between IPs and domain names. tools.ietf.org/html/rfc1123#page-13Breana
O
3

A revised regex based on comments here and my own reading of RFCs 1035 & 1123:

Ruby: \A(?!-)[a-zA-Z0-9-]{1,63}(?<!-)\z (tests below)

Python: ^(?!-)[a-zA-Z0-9-]{1,63}(?<!-)$ (not tested by me)

Javascript: pattern = /^(?!-)[a-zA-Z0-9-]{1,63}$/g; (based on Tom Lime's answer, not tested by me)

Tests:

tests = [
  ['01010', true],
  ['abc', true],
  ['A0c', true],
  ['A0c-', false],
  ['-A0c', false],
  ['A-0c', true],
  ['o123456701234567012345670123456701234567012345670123456701234567', false],
  ['o12345670123456701234567012345670123456701234567012345670123456', true],
  ['', false],
  ['a', true],
  ['0--0', true],
  ["A0c\nA0c", false]
]

regex = /\A(?!-)[a-zA-Z0-9-]{1,63}(?<!-)\z/
tests.each do |label, expected|
  is_match = !!(regex =~ label)
  puts is_match == expected
end

Notes:

  1. Thanks to Mark Byers for the original code fragment
  2. solidsnack points out that RFC 1123 allows all-numeric labels (https://www.rfc-editor.org/rfc/rfc1123#page-13)
  3. RFC 1035 does not allow zero-length labels (https://www.rfc-editor.org/rfc/rfc1035): <label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]
  4. I've added a test specifically for Ruby that ensures a new line is not embedded in the label. This is thanks to notes by ssorallen.
  5. This code is available here: https://github.com/Xenapto/domain-label-validation - I'm happy to accept pull requests if you want to update it.
Ottinger answered 6/1, 2014 at 8:36 Comment(0)
H
3

While the accepted answer is correct, RFC2181 also states under Section 11, "Name Syntax":

The DNS itself places only one restriction on the particular labels that can be used to identify resource records. That one restriction relates to the length of the label and the full name. [...] Implementations of the DNS protocols must not place any restrictions on the labels that can be used. In particular, DNS servers must not refuse to serve a zone because it contains labels that might not be acceptable to some DNS client programs.

This in turn means other characters such as underscores should be allowed.

Hungerford answered 2/9, 2015 at 21:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.