How are Docker image names parsed?
Asked Answered
A

3

30

When doing a docker push or when pulling an image, how does Docker determine if there is a registry server in the image name or if it is a path/username on the default registry (e.g. Docker Hub)?

I'm seeing the following from the 1.1 image specification:

Tag

A tag serves to map a descriptive, user-given name to any single image ID. Tag values are limited to the set of characters [a-zA-Z_0-9].

Repository

A collection of tags grouped under a common prefix (the name component before :). For example, in an image tagged with the name my-app:3.1.4, my-app is the Repository component of the name. A repository name is made up of slash-separated name components, optionally prefixed by a DNS hostname. The hostname must follow comply with standard DNS rules, but may not contain _ characters. If a hostname is present, it may optionally be followed by a port number in the format :8080. Name components may contain lowercase characters, digits, and separators. A separator is defined as a period, one or two underscores, or one or more dashes. A name component may not start or end with a separator.

For the DNS host name, does it need to be fully qualified with dots, or is "my-local-server" a valid registry hostname? For the name components, I'm seeing periods as valid, which implies "team.user/appserver" is a valid image name. If the registry server is running on port 80, and therefore no port number is needed on the hostname in the image name, it seems like there would be ambiguity between the hostname and the path on the registry server. I'm curious how Docker resolves that ambiguity.

Avaunt answered 16/6, 2016 at 14:15 Comment(0)
A
35

TL;DR: The hostname must contain a . dns separator, a : port separator, or the value "localhost" before the first /. Otherwise the code assumes you want the default registry, Docker Hub.


After some digging through the code, I came across distribution/distribution/reference/reference.go with the following:

// Grammar
//
//  reference                       := name [ ":" tag ] [ "@" digest ]
//  name                            := [hostname '/'] component ['/' component]*
//  hostname                        := hostcomponent ['.' hostcomponent]* [':' port-number]
//  hostcomponent                   := /([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9])/
//  port-number                     := /[0-9]+/
//  component                       := alpha-numeric [separator alpha-numeric]*
//  alpha-numeric                   := /[a-z0-9]+/
//  separator                       := /[_.]|__|[-]*/
//
//  tag                             := /[\w][\w.-]{0,127}/
//
//  digest                          := digest-algorithm ":" digest-hex
//  digest-algorithm                := digest-algorithm-component [ digest-algorithm-separator digest-algorithm-component ]
//  digest-algorithm-separator      := /[+.-_]/
//  digest-algorithm-component      := /[A-Za-z][A-Za-z0-9]*/
//  digest-hex                      := /[0-9a-fA-F]{32,}/ ; At least 128 bit digest value

The actual implementation of that is via a regex in distribution/distribution/reference/regexp.go.

But with some digging and poking, I found that there's another check beyond that regex (e.g. you'll get errors with an uppercase hostname if you don't don't include a . or :). And I tracked down the actual split of the name to the following in distribution/distribution/reference/normalize.go:

// splitDockerDomain splits a repository name to domain and remotename string.
// If no valid domain is found, the default domain is used. Repository name
// needs to be already validated before.
func splitDockerDomain(name string) (domain, remainder string) {
    i := strings.IndexRune(name, '/')
    if i == -1 || (!strings.ContainsAny(name[:i], ".:") && name[:i] != "localhost") {
        domain, remainder = defaultDomain, name
    } else {
        domain, remainder = name[:i], name[i+1:]
    }
    if domain == legacyDefaultDomain {
        domain = defaultDomain
    }
    if domain == defaultDomain && !strings.ContainsRune(remainder, '/') {
        remainder = officialRepoName + "/" + remainder
    }
    return
}

The important part of that for me is the check for the ., :, or the hostname localhost before the first / in the first if statement. With it, the hostname is split out from before the first /, and without it, the entire name is passed to the default registry hostname.

Avaunt answered 16/6, 2016 at 19:34 Comment(12)
according to the image-spec the tag is limited to 127 chars. So I think the tag regex should be /[\w][\w.-]{0,126}/Ediva
The regex length is 0 to 127 chars, so I think that's right. If not, then that would be a PR to change this: github.com/docker/distribution/blob/master/reference/…Avaunt
Here's the way I figure it.... The regex starts [/w][\w.-]{0,127} and there is no | between the [ ]. So it means match a /w and then match up to 127 \w or dot or hyphen. Trying /^([\w][\w.-]{0,4})$/.match('ssss-') in Ruby's irb confirms that that 5 characters are consumed...Ediva
My eyes glazed over the first [\w], so you're correct, the regex is longer than the spec. If you hear back on a PR to get the two in sync, let me know and I'll be happy to update this answer.Avaunt
Great answer! I glad I found this before digging through the source to find syntax definitions.Jillene
The image-spec at github.com/moby/moby/blob/master/image/spec/v1.1.md has now been updated to say that tags are limited to 128 characters. The PR thread is here github.com/docker/distribution/issues/2248Ediva
Thanks for following up. Since the change was to the spec rather than the implementation, I don't think there's anything to change in my answer above. If you see otherwise, let me know.Avaunt
The grammar you pasted in from the top of distribution/reference/reference.go has been updated a bit. Also I see the link docker/reference.go is now stale.Ediva
This is great. In short, I'll try and remember it this way: [[host:port/]registry/]component[:tag][@digest].Superable
The code seems to have moved. I think it all uses the distribution code now: github.com/docker/distribution/blob/master/reference/regexp.go. It's a bit difficult to decipher though :(Watercourse
@AdrianMouat I'm going to need to do some digging to find where the actual parsing happens now. I believe the regex was always there, but not always used.Avaunt
I've updated the stale link to point to the new location. Hopefully this one with the release pin will stay valid for longer.Avaunt
M
1

Note: Many URL parsing libraries aren't able to parse docker image references / tags, unless they conform to standardized URL format.

Example Ansible Snippet:

- debug: #(FAILS)
    msg: "{{ 'docker.io/alpine' | urlsplit() }}"
# ^-- This will fail, because the image reference isn't in standard URL format

# If you can convert the docker image reference to standard URL format
# Then most URL parsing libraries will work correctly

- debug: #(WORKS)
    msg: "{{ ('https://' + 'docker.io/alpine') | urlsplit() }}"
# ^-- Example: This becomes standard URL syntax, so it parses correctly

- debug: #(FAILS)
    msg: "{{ ('http://' + 'busybox:1.34.1-glibc') | urlsplit('path') }}"
# ^-- Unfortunately, this trick won't work to turn 100% of images into 
#     Standard URL format for parsing. (This example fails as well)

Based on BMitch's answer I realized a simple if statement algorithmic logic could be used to convert arbitrary docker image references / tags into standardized URL format, which allows them to be parsed by most libraries.

Algorithm in human speak:

1. look for / in $TAG
2. If / not found 
   Then return ("https://docker.io/" + $TAG)
3. If / found, split $TAG into 2 parts by first /
   and test text left of /, to look for ".", ":", or "localhost"
4. If (".", ":", or "localhost" found in text left of 1st /)
   Then return (https://" + $TAG)
5. If (".", ":", or "localhost" not found in text left of 1st /)
   Then return (https://docker.io/ + $TAG)

(This logic converts docker tags into standardized URL format 
so they can be processed by URL parsing libraries.)

Algorithm in Bash:
vi docker_tag_to_standardized_url_format.sh
(Copy paste the following)

#!/bin/bash
#This standardizes the naming of docker images
#Basically busybox --------------------> https://docker.io/busybox
#          myregistry.tld/myimage:tag -> https://myregistry.tld/myimage:tag
STDIN=$(cat -)
INPUT=$STDIN
OUTPUT=""

echo "$INPUT" | grep "/" > /dev/null
if [ $? -eq 0 ]; then
  echo "$INPUT" | cut -d "/" -f1 | egrep "\.|:|localhost" > /dev/null
  #Note: grep considers . as wildcard, \ is escape character to treat \. as .
  if [ $? -eq 0 ]; then
    OUTPUT="https://$INPUT"
  else
    OUTPUT="https://docker.io/$INPUT"
  fi
else
  OUTPUT="https://docker.io/$INPUT"
fi

echo $OUTPUT

Make it executable:
chmod +x ./docker_tag_to_standardized_url_format.sh

Usage Example:

# Test data, to verify against edge cases
A=docker.io/alpine
B=docker.io/rancher/system-upgrade-controller:v0.8.0
C=busybox:1.34.1-glibc
D=busybox
E=rancher/system-upgrade-controller:v0.8.0
F=localhost:5000/helloworld:latest
G=quay.io/go/go/gadget:arms
####################################
echo $A | ./docker_tag_to_standardized_url_format.sh 
echo $B | ./docker_tag_to_standardized_url_format.sh
echo $C | ./docker_tag_to_standardized_url_format.sh
echo $D | ./docker_tag_to_standardized_url_format.sh
echo $E | ./docker_tag_to_standardized_url_format.sh
echo $F | ./docker_tag_to_standardized_url_format.sh
echo $G | ./docker_tag_to_standardized_url_format.sh
Mangosteen answered 18/12, 2021 at 4:31 Comment(2)
While this is interesting, the question is how docker parses a reference, and not how to save a reference within a url data type.Avaunt
Honestly my answer was more aimed at this question #42116277, but it was closed and marked as a duplicate of this one, so I answered here, since I couldn't post my answer on the closed question.Mangosteen

© 2022 - 2024 — McMap. All rights reserved.