How are Docker image names parsed?

Asked 16/6, 2016 at 14:15 Answered 18/12, 2021 at 4:31

When doing a docker push or when pulling an image, how does Docker determine if there is a registry server in the image name or if it is a path/username on the default registry (e.g. Docker Hub)?

I'm seeing the following from the 1.1 image specification:

Tag

A tag serves to map a descriptive, user-given name to any single image ID. Tag values are limited to the set of characters [a-zA-Z_0-9].

Repository

A collection of tags grouped under a common prefix (the name component before :). For example, in an image tagged with the name my-app:3.1.4, my-app is the Repository component of the name. A repository name is made up of slash-separated name components, optionally prefixed by a DNS hostname. The hostname must follow comply with standard DNS rules, but may not contain _ characters. If a hostname is present, it may optionally be followed by a port number in the format :8080. Name components may contain lowercase characters, digits, and separators. A separator is defined as a period, one or two underscores, or one or more dashes. A name component may not start or end with a separator.

For the DNS host name, does it need to be fully qualified with dots, or is "my-local-server" a valid registry hostname? For the name components, I'm seeing periods as valid, which implies "team.user/appserver" is a valid image name. If the registry server is running on port 80, and therefore no port number is needed on the hostname in the image name, it seems like there would be ambiguity between the hostname and the path on the registry server. I'm curious how Docker resolves that ambiguity.

Avaunt answered 16/6, 2016 at 14:15 Comment(0)

TL;DR: The hostname must contain a . dns separator, a : port separator, or the value "localhost" before the first /. Otherwise the code assumes you want the default registry, Docker Hub.

After some digging through the code, I came across distribution/distribution/reference/reference.go with the following:

// Grammar
//
//  reference                       := name [ ":" tag ] [ "@" digest ]
//  name                            := [hostname '/'] component ['/' component]*
//  hostname                        := hostcomponent ['.' hostcomponent]* [':' port-number]
//  hostcomponent                   := /([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9])/
//  port-number                     := /[0-9]+/
//  component                       := alpha-numeric [separator alpha-numeric]*
//  alpha-numeric                   := /[a-z0-9]+/
//  separator                       := /[_.]|__|[-]*/
//
//  tag                             := /[\w][\w.-]{0,127}/
//
//  digest                          := digest-algorithm ":" digest-hex
//  digest-algorithm                := digest-algorithm-component [ digest-algorithm-separator digest-algorithm-component ]
//  digest-algorithm-separator      := /[+.-_]/
//  digest-algorithm-component      := /[A-Za-z][A-Za-z0-9]*/
//  digest-hex                      := /[0-9a-fA-F]{32,}/ ; At least 128 bit digest value

The actual implementation of that is via a regex in distribution/distribution/reference/regexp.go.

But with some digging and poking, I found that there's another check beyond that regex (e.g. you'll get errors with an uppercase hostname if you don't don't include a . or :). And I tracked down the actual split of the name to the following in distribution/distribution/reference/normalize.go:

// splitDockerDomain splits a repository name to domain and remotename string.
// If no valid domain is found, the default domain is used. Repository name
// needs to be already validated before.
func splitDockerDomain(name string) (domain, remainder string) {
    i := strings.IndexRune(name, '/')
    if i == -1 || (!strings.ContainsAny(name[:i], ".:") && name[:i] != "localhost") {
        domain, remainder = defaultDomain, name
    } else {
        domain, remainder = name[:i], name[i+1:]
    }
    if domain == legacyDefaultDomain {
        domain = defaultDomain
    }
    if domain == defaultDomain && !strings.ContainsRune(remainder, '/') {
        remainder = officialRepoName + "/" + remainder
    }
    return
}

The important part of that for me is the check for the ., :, or the hostname localhost before the first / in the first if statement. With it, the hostname is split out from before the first /, and without it, the entire name is passed to the default registry hostname.

Avaunt answered 16/6, 2016 at 19:34 Comment(12)

according to the image-spec the tag is limited to 127 chars. So I think the tag regex should be /[\w][\w.-]{0,126}/ – Ediva 14/4, 2017 at 20:3

The regex length is 0 to 127 chars, so I think that's right. If not, then that would be a PR to change this: github.com/docker/distribution/blob/master/reference/… – Avaunt 14/4, 2017 at 20:31

Here's the way I figure it.... The regex starts [/w][\w.-]{0,127} and there is no | between the [ ]. So it means match a /w and then match up to 127 \w or dot or hyphen. Trying /^([\w][\w.-]{0,4})$/.match('ssss-') in Ruby's irb confirms that that 5 characters are consumed... – Ediva 15/4, 2017 at 21:36

My eyes glazed over the first [\w], so you're correct, the regex is longer than the spec. If you hear back on a PR to get the two in sync, let me know and I'll be happy to update this answer. – Avaunt 15/4, 2017 at 21:41

Great answer! I glad I found this before digging through the source to find syntax definitions. – Jillene 30/4, 2017 at 20:11

The image-spec at github.com/moby/moby/blob/master/image/spec/v1.1.md has now been updated to say that tags are limited to 128 characters. The PR thread is here github.com/docker/distribution/issues/2248 – Ediva 16/5, 2017 at 17:55

Thanks for following up. Since the change was to the spec rather than the implementation, I don't think there's anything to change in my answer above. If you see otherwise, let me know. – Avaunt 16/5, 2017 at 18:20

The grammar you pasted in from the top of distribution/reference/reference.go has been updated a bit. Also I see the link docker/reference.go is now stale. – Ediva 16/5, 2017 at 18:27

This is great. In short, I'll try and remember it this way: [[host:port/]registry/]component[:tag][@digest]. – Superable 14/3, 2018 at 22:33

The code seems to have moved. I think it all uses the distribution code now: github.com/docker/distribution/blob/master/reference/regexp.go. It's a bit difficult to decipher though :( – Watercourse 28/12, 2018 at 15:51

@AdrianMouat I'm going to need to do some digging to find where the actual parsing happens now. I believe the regex was always there, but not always used. – Avaunt 28/12, 2018 at 16:5

I've updated the stale link to point to the new location. Hopefully this one with the release pin will stay valid for longer. – Avaunt 3/1, 2019 at 20:40

Note: Many URL parsing libraries aren't able to parse docker image references / tags, unless they conform to standardized URL format.

Example Ansible Snippet:

- debug: #(FAILS)
    msg: "{{ 'docker.io/alpine' | urlsplit() }}"
# ^-- This will fail, because the image reference isn't in standard URL format

# If you can convert the docker image reference to standard URL format
# Then most URL parsing libraries will work correctly

- debug: #(WORKS)
    msg: "{{ ('https://' + 'docker.io/alpine') | urlsplit() }}"
# ^-- Example: This becomes standard URL syntax, so it parses correctly

- debug: #(FAILS)
    msg: "{{ ('http://' + 'busybox:1.34.1-glibc') | urlsplit('path') }}"
# ^-- Unfortunately, this trick won't work to turn 100% of images into 
#     Standard URL format for parsing. (This example fails as well)

Based on BMitch's answer I realized a simple if statement algorithmic logic could be used to convert arbitrary docker image references / tags into standardized URL format, which allows them to be parsed by most libraries.

Algorithm in human speak:

1. look for / in $TAG
2. If / not found 
   Then return ("https://docker.io/" + $TAG)
3. If / found, split $TAG into 2 parts by first /
   and test text left of /, to look for ".", ":", or "localhost"
4. If (".", ":", or "localhost" found in text left of 1st /)
   Then return (https://" + $TAG)
5. If (".", ":", or "localhost" not found in text left of 1st /)
   Then return (https://docker.io/ + $TAG)

(This logic converts docker tags into standardized URL format 
so they can be processed by URL parsing libraries.)

Algorithm in Bash:
vi docker_tag_to_standardized_url_format.sh
(Copy paste the following)

#!/bin/bash
#This standardizes the naming of docker images
#Basically busybox --------------------> https://docker.io/busybox
#          myregistry.tld/myimage:tag -> https://myregistry.tld/myimage:tag
STDIN=$(cat -)
INPUT=$STDIN
OUTPUT=""

echo "$INPUT" | grep "/" > /dev/null
if [ $? -eq 0 ]; then
  echo "$INPUT" | cut -d "/" -f1 | egrep "\.|:|localhost" > /dev/null
  #Note: grep considers . as wildcard, \ is escape character to treat \. as .
  if [ $? -eq 0 ]; then
    OUTPUT="https://$INPUT"
  else
    OUTPUT="https://docker.io/$INPUT"
  fi
else
  OUTPUT="https://docker.io/$INPUT"
fi

echo $OUTPUT

Make it executable:
chmod +x ./docker_tag_to_standardized_url_format.sh

Usage Example:

# Test data, to verify against edge cases
A=docker.io/alpine
B=docker.io/rancher/system-upgrade-controller:v0.8.0
C=busybox:1.34.1-glibc
D=busybox
E=rancher/system-upgrade-controller:v0.8.0
F=localhost:5000/helloworld:latest
G=quay.io/go/go/gadget:arms
####################################
echo $A | ./docker_tag_to_standardized_url_format.sh 
echo $B | ./docker_tag_to_standardized_url_format.sh
echo $C | ./docker_tag_to_standardized_url_format.sh
echo $D | ./docker_tag_to_standardized_url_format.sh
echo $E | ./docker_tag_to_standardized_url_format.sh
echo $F | ./docker_tag_to_standardized_url_format.sh
echo $G | ./docker_tag_to_standardized_url_format.sh

Mangosteen answered 18/12, 2021 at 4:31 Comment(2)

While this is interesting, the question is how docker parses a reference, and not how to save a reference within a url data type. – Avaunt 18/12, 2021 at 11:22

Honestly my answer was more aimed at this question #42116277, but it was closed and marked as a duplicate of this one, so I answered here, since I couldn't post my answer on the closed question. – Mangosteen 18/12, 2021 at 16:24

The image-spec at https://github.com/moby/moby/blob/master/image/spec/v1.1.md has now been updated to say that tags are limited to 128 characters.

The PR thread is here https://github.com/docker/distribution/issues/2248

Some Ruby code is here https://github.com/cyber-dojo/runner/blob/e98bc280c5349cb2919acecb0dfbfefa1ac4e5c3/src/docker/image_name.rb

Some Ruby tests are https://github.com/cyber-dojo/runner/blob/e98bc280c5349cb2919acecb0dfbfefa1ac4e5c3/test_server/image_name_test.rb

Ediva answered 16/5, 2017 at 17:54 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags