I'm asking about the format used after the password is hashed and preparing it for storage. The dollar sign $ annotation is something that seems to be widespread. Is that described in a standard somewhere (including the identifiers for algorithms)?

For example, when using Go with golang.org/x/crypto/bcrypt, it gives such an encoded string (playground):

func main() {
    h, err := bcrypt.GenerateFromPassword([]byte("foo"), bcrypt.DefaultCost)
    if err != nil {
        panic(err)
    }

    fmt.Printf("%s", h)
    // Output: $2a$10$g1d5KuvDIrRoUyWL2BQs7uLOWCzlM.zqbRm8o364u20p20YNmJ.Ve
}

However, other hashing packages like scrypt (example) and argon2 return just the resulting hash. Using the argon2 shell command, there is an encoded string returned:

echo "foo" | argon2 saltsalt
Type:           Argon2i
Iterations:     3
Memory:         4096 KiB
Parallelism:    1
Hash:           d9e4f94546b9e5b0cfb2dbf9dad81d41371845d8b6a8c25ce7caf23e13f1ef72
Encoded:        $argon2i$v=19$m=4096,t=3,p=1$c2FsdHNhbHQ$2eT5RUa55bDPstv52tgdQTcYRdi2qMJc58ryPhPx73I
0.005 seconds
Verification ok

I found a Go / argon2 specific blog post explaining this encoding, so far so good

Variations I found

My trouble lies with the definition of the dollar separated string, the portability and variations I found.

glibc

The man 3 crypt page gives some pointers. There is a table of identifiers:

              ID   Method
              ───────────────────────────────────────────────────────────
              1    MD5
              2a   Blowfish (not in mainline glibc; added in some Linux
                   distributions)
              5    SHA-256 (since glibc 2.7)
              6    SHA-512 (since glibc 2.7)

But this doesn't cover newer types, like argon2i or scrypt.

Then there are the example strings:

$id$salt$encrypted
$id$rounds=yyy$salt$encrypted

The latter being only supported after Glibc 2.7.

bcrypt

While bcrypt uses the 2a (blowfish) identifier from Glibc, its encoding seems to be slightly different as seen from the above example:

$2a$10$g1d5KuvDIrRoUyWL2BQs7uLOWCzlM.zqbRm8o364u20p20YNmJ.Ve
$id$cost$<dot seperated line of what exactly?>

argon2

Argon2 uses 5 fields and a full name identifier like argon2

$argon2i$v=19$m=4096,t=3,p=1$c2FsdHNhbHQ$2eT5RUa55bDPstv52tgdQTcYRdi2qMJc58ryPhPx73I
$id$version$parameters$salt$encrypted

why?

I want to write a package that hashes and verifies passwords in an algorithm agnostic way. Allowing the consumers to change parameters and algorithms without refactoring their code. Therefore during verification the package should be able to assert the algorithm used when storing the password. If stored version of parameters or algorithm is different than the one currently in use, the password is re-hashed and a new encoded string is returned.

As a bonus, I would like the package to have the ability to re-hash "legacy" passwords which might have been stored by older (not go) applications. For instance, md5. In order to do all this I would like to have a deeper understanding of the storage format itself.

what is the standard for password hash string encoding?

There is none.

Hey, that was an easy answer! Clicks "Post Your Answer".

Okay, while the above statement is unfortunately true, thankfully, there are some people who have already gone through the trouble of collecting a lot of information about all of the variations in use.

In particular, the authors of the Passlib library for Python (which does essentially the same thing you want to do) have written up a page about what they call the Modular Crypt Format which they call "a standard that isn’t". Here are some choice quotes from that page [bold italic emphasis mine]:

However, there’s no official specification document describing this format. Nor is there a central registry of identifiers, or actual rules. The modular crypt format is more of an ad-hoc idea rather than a true standard.

[Modular Crypt Format – Overview]

Unfortunately, there is no specification document for this format. Instead, it exists in de facto form only

When MCF was first introduced, most schemes choose a single digit as their identifier (e.g. $1$ for md5_crypt). Because of this, some older systems only look at the first character when attempting to distinguish hashes.

Most modular crypt format hashes follow this convention, though some (like bcrypt) omit the $ separator between the configuration and the digest.

[T]here is no set standard about whether configuration strings should or should not include a trailing $ at the end

[Modular Crypt Format – Requirements]

Please note that the Modular Crypt Format is not a specification or a standard. It is a description of the various different formats that are used in the wild. There is an attempt at a specification by the organizers of the Password Hashing Competition (PHC), called the PHC String Format. However, the PHC is no formal standards organization with any kind of authority. It is just a loose group of cryptographers. While they recommend that every new password hashing function should use the PHC String Format, they can only mandate it for password hashing functions that are submitted to the Password Hashing Competition.

And either way, the PHC String Format only applies to new password hashing functions, not to existing ones.

While I strongly suggest that you should use the PHC String Format for any output you generate, you will still have to deal with inputs in all sorts of different formats, including some gems like these:

cta_pbkdf2_sha1 and dlitz_pbkdf2_sha1 both use the same identifier. While there are other internal differences, the two can be quickly distinguished by the fact that cta hashes always end in =, while dlitz hashes contain no = at all.

Variations I found

glibc

bcrypt

argon2

why?

Recommended topics

Hot tags