Understanding bcrypt salt as used by PHP password_hash
Asked Answered
M

2

9

I have some trouble to understand how bcrypt uses the salt. I know what the salt is good for but I do not understand how the salt value is used exactly.

Problem 1: What is the correct salt length?

All sources I found say, that the salt has a length of 22 and that it is stored together with the algorithm, the costs and the actual hash value in the result string.

enter image description here

However, all implementations I found, use a salt with length 32. For example the FOSUserBundle used by Symfony used the following code to creat the salt:

$this->salt = base_convert(sha1(uniqid(mt_rand(), true)), 16, 36)

Since a sha1 hash is 32 chars long, the generated salt also has a length of 32. Is this just a lazy implementation, skipping the code to trim the string to a length of 22 because this is done by bcrypt it self? Or are 32 chars necessary for some reason?

Problem 2: Is a salt length of 22 really correct?

In the following example it seems, that only the first 21 chars of the salt are saved in the result string. Passing these 21 chars as salt to password_hash will result in an error, but padding a 0 will work:

$s = 'password';
$salt        = 'salt5678901234567890123456789012';
$salt_prefix = 'salt567890123456789010'; // first 21 chars of salt + 0

$h1 = password_hash($s, PASSWORD_BCRYPT, array('salt' => $salt));
$h2 = password_hash($s, PASSWORD_BCRYPT, array('salt' => $salt_prefix));

echo $h1 . PHP_EOL;
echo $h2 . PHP_EOL;

//Result
$2y$10$salt56789012345678901uTWNlUnhu5K/xBrtKYTo7oDy8zMr/csu
$2y$10$salt56789012345678901uTWNlUnhu5K/xBrtKYTo7oDy8zMr/csu

So, one needs to pass a salt with at least 22 chars to the algorithm but the 22nd chars seems to be useless. Is that correct? What is the sense of the 22nd char if it is not used at all?

Problem 3: Why not specify the salt manually?

In the PHP function password_hash using a manual hash is deprecated. Instead one is encouraged to let password_hash automatically, since would be safer.

I understand that using a "weak" salt or the same salt for all passwords can lead to risks due to rainbow tables. But why is it safer to use the auto-generated salt in general?

Why is it safer to use the auto-generated salt instead of manual salt, that is generated like this:

$this->salt = base_convert(sha1(uniqid(mt_rand(), true)), 16, 36)

Problem 4: Is there any replacement for password_hash that still allows the usage of a custom salt?

Due to the implementation of project I am working on, I need to control the salt, that is used to generate a password hash. This can be changed in the future, but right know it is necessary to set the salt manually. Since this feature is deprecated in password_hash, I need some alternative to generate the hash. How to do this?

EDIT:

Just a short explanation why I need to control the salt: The password is not only used to login into the web app directly, but also to connect to the app via a REST API. The client requests the salt from the server and uses it (algorithm and costs are known) to hash the password, the user entered on the client side.

The hashed password then send back to the server for authentication. The purpose is to not send the password in plain text. To be able to generate the same hash on the client as on the server, the client needs to know which salt the server used.

I know that a hashed password does not add any real security, since the communication is already uses HTTPS only. However this the way the clients currently operate: Authentication is granted if the client send back the correct password hash.

I cannot change the server side without breaking thousands of existing clients. The clients can be updated sometime in the future, but this will be a long process.

Since this is done, I need to follow the old process, which means I need to be able to tell the clients the salt.

However I do not need to generate the salt myself. I am totally fine if PHP knows the most secure way how to do this. But I do need to get/extract the salt someway, to send it to the clients.

If I understood everything correctly, I could just let password_hash do the work and then extract the chars 7-29 from result string. Is this correct?

Marozas answered 6/12, 2016 at 10:56 Comment(7)
Answer 3: Because PHP doesn't trust you to generate a good random salt by yourself, and seeing how you propose to generate the salt I'm inclined to agree. ;-)Clavicembalo
Thanks :-) This is not my implementation but the one FOSUserBundle used. Could explain, why this a bad solution?Marozas
It's bad because you're using uniqid and mt_rand functions which don't produce cryptographically secure values. They don't use a good source of randomness. password_hash does use cryptographically secure source of randomness. Like @Clavicembalo said, you created a weak salt, and PHP maintainers assumed people would do that so they created a simple function (password_hash) that abstracts implementation away from you and produces secure output. However, you now want to control that with weak salt - good luck with that, I've no idea why you need to do that but it's your project :)Nut
Even the manual for uniqid says that it doesn't guarantee uniqueness and at best can "increase chances of uniqueness". Even attempting to mitigate that with mt_rand, it's still very different than a truly random value generated from a good (P)RNG. – Having said that, it'll probably be good enough in practice… but why not simply use a good PRNG?Clavicembalo
OK, I understand this and will keep it in mind for an update. However the main goal is still to understand how the salt work at all. If anyone knows the answer to the other questions, that would be really great!Marozas
You're moving the goalposts on an already veeeery broad question. But putting that aside, your API authentication scheme not only doesn't add actual security - it reduces it. If you expect the hash itself as an input, the hash becomes the password. You now practically store all of your passwords in plaintext.Worthington
The hash itself is not send back to the server, there are other steps involved like adding in a nonce, a timestamp, using sha1 on that, etc. I just didn't what do make the question more complex by adding those details :-) It is only important, that both the web app and the clients needs to be able to use the same salt during the auth process.Marozas
W
11

Problem 1: What is the correct salt length?

All sources I found say, that the salt has a length of 22 and that it is stored together with the algorithm, the costs and the actual hash value in the result string.

If all sources say it, there's shouldn't be a reason for you to question that ...

There's no universal salt size, it depends on the algorithm and for bcrypt, it is 22 ... although there's a catch. The necessary size is actually 16 bytes, but that is actually Base64-encoded (*).

When you Base64-encode 16 bytes of data, that will result in a 24-character length ASCII string, with the last 2 characters being irrelevant - that becomes 22 when you trim those 2 irrelevant ones.

Why are they irrelevant? Your question is broad enough already ... read the Wikipedia page for Base64.

* There are actually a few Base64 "dialects" and the one used by bcrypt is not quite the same as PHP's base64_encode().

However, all implementations I found, use a salt with length 32. For example the FOSUserBundle used by Symfony used the following code to creat the salt:

$this->salt = base_convert(sha1(uniqid(mt_rand(), true)), 16, 36)

Since a sha1 hash is 32 chars long, the generated salt also has a length of 32. Is this just a lazy implementation, skipping the code to trim the string to a length of 22 because this is done by bcrypt it self? Or are 32 chars necessary for some reason?

That line will result in a 31-character string, not 32, but that's not actually relevant. If you provide a longer string, only the necessary part of it will be used - those last characters will be ignored.
You can test this yourself:

php > var_dump(password_hash('foo', PASSWORD_DEFAULT, ['salt' => str_repeat('a', 22).'b']));
string(60) "$2y$10$aaaaaaaaaaaaaaaaaaaaaO8Q0BjhyjLkn5wwHyGGWhEnrex6ji3Qm"
php > var_dump(password_hash('foo', PASSWORD_DEFAULT, ['salt' => str_repeat('a', 22).'c']));
string(60) "$2y$10$aaaaaaaaaaaaaaaaaaaaaO8Q0BjhyjLkn5wwHyGGWhEnrex6ji3Qm"
php > var_dump(password_hash('foo', PASSWORD_DEFAULT, ['salt' => str_repeat('a', 22).'d']));
string(60) "$2y$10$aaaaaaaaaaaaaaaaaaaaaO8Q0BjhyjLkn5wwHyGGWhEnrex6ji3Qm"

(if the extra characters were used, the resulting hashes would differ)

I'm not familiar with that FOSUserBundle, but yes - it does look like it's just doing something lazy, and incorrect.

Problem 2: Is a salt length of 22 really correct?

In the following example it seems, that only the first 21 chars of the salt are saved in the result string. Passing these 21 chars as salt to password_hash will result in an error, but padding a 0 will work:

$s = 'password';
$salt        = 'salt5678901234567890123456789012';
$salt_prefix = 'salt567890123456789010'; // first 21 chars of salt + 0

$h1 = password_hash($s, PASSWORD_BCRYPT, array('salt' => $salt));
$h2 = password_hash($s, PASSWORD_BCRYPT, array('salt' => $salt_prefix));

echo $h1 . PHP_EOL;
echo $h2 . PHP_EOL;

//Result
$2y$10$salt56789012345678901uTWNlUnhu5K/xBrtKYTo7oDy8zMr/csu
$2y$10$salt56789012345678901uTWNlUnhu5K/xBrtKYTo7oDy8zMr/csu

So, one needs to pass a salt with at least 22 chars to the algorithm but the 22nd chars seems to be useless. Is that correct? What is the sense of the 22nd char if it is not used at all?

It's not really irrelevant ... pad it with e.g. an 'A' and you'll see a different result.

I can't explain this properly to be honest, but it is again caused by how Base64 works and because in the resulting hash, you actually see something similar to this (pseudo-code):

base64_encode(  base64_decode($salt) . $actualHashInBinary  )

That is, the (supposedly) Base64-encoded salt is first de-coded to raw binary, used to create the actual hash (again in raw binary), the two are concatenated and then that whole thing is Base64-encoded.
Since the input salt is actually the 22 relevant out of a 24-size full length, we actually have an incomplete block at the end, which is completed (filled?) by the beginning of the raw hash ...

It is a different thing to concatenate 2 separate Base64-encoded values, and to concatenate the raw values before Base64-encoding them.

Problem 3: Why not specify the salt manually?

In the PHP function password_hash using a manual hash is deprecated. Instead one is encouraged to let password_hash automatically, since would be saver.

I understand that using a "weak" salt or the same salt for all passwords can lead to risks due to rainbow tables. But why is it saver to use the auto-generated salt in general?

Simply put - the salt needs to be cryptographically secure (i.e. unpredictable), and PHP already knows how to do that, while chances are (overwhelmingly) that you don't.

Unless you have an actual hardware CSPRNG (that PHP isn't already configured to use), the best thing you can do is to leave PHP to automatically generate the salt anyway. Yet, here we are, you obviously wanting to do the opposite (for whatever reason) and making it less secure in the process - a lot of people do that.
This is why the salt option is deprecated - to protect you from yourself. :)

Why is it saver to use the auto-generated salt instead of manual salt, that is generated like this:

$this->salt = base_convert(sha1(uniqid(mt_rand(), true)), 16, 36)

As I said, the salt needs to be unpredictable. In this specific example - none of the functions used are unpredictable, even mt_rand().
Yes, mt_rand() is not actually random, despite what its name implies.

Problem 4: Is there any replacement for password_hash that still allows the usage of a custom salt?

Due to the implementation of project I am working on, I need to control the salt, that is used to generate a password hash. This can be changed in the future, but right know it is necessary to set the salt manually. Since this feature is deprecated in password_hash, I need some alternative to generate the hash. How to do this?

You don't.

There's absolutely zero reason for your project to dictate how the password_hash() salt is generated. I don't know why you think it is necessary, but it 100% isn't - it would make no sense.

Though, ultimately - this is why deprecations are put in place before something is removed. Now you know the salt option will be removed in the future, and you have plenty of time to refactor your application.
Use it wisely, don't try to replicate deprecated functionality. You should be working in the opposite direction - ask how to separate the two without breaking your application.

Worthington answered 6/12, 2016 at 12:34 Comment(8)
Wow, thank you very much for that great answer! I need to control the salt, but I need to know which salt was used. I edited my post to explain this in detail. Is my assumption correct: Extracting the salt from the hash-result by copying the chars 7 - 22 should be sufficient, right?Marozas
Not really. Without re-implementing bcrypt's Base64 dialect, you won't be able to reliably extract just the salt. There's no easy way around this.Worthington
@AndreiHerford - While it is absolutely not recommended to extract the salt, and you should think about changing your authentication scheme, I couldn't find a case that didn't work. After all this is what the underlying crypt()function gets. Be aware though, that with extracting the salt, password_hash() is not future proof anymore (change of the algorithm), and that there theoretically could exist implementations which are not compatible. Have a look at this answer to get more information about the encoding of the salt.Undirected
@Undirected I also could not find any case where it didn't work. Of course this is no proof, that no such case exists. What do you mean, with "with extracting the salt password_hash is not future proof anymore"? When using bcrypt the result string produced by password_hash will always have the same format and thus contain the salt. After all password_hash is just a convenience wrapper around the underling crypt functions and (no matter what changes future versions of password_hash include), requesting a bcrypt hash will always result in a brcypt hash, which includes the saltMarozas
@AndreiHerford The entire point of password_hash() (and its sibling functions) is to make it easy to migrate to another algorithm in the future. If it was just a crypt() + CRYPT_BLOWFISH wrapper, it would be called "bcrypt".Worthington
@AndreiHerford - As long as you use the parameter PASSWORD_BCRYPT you are probably right, but recommended is the parameter PASSWORD_DEFAULT which could change the algorithm, should this become necessary.Undirected
In that case the complete REST authentication would fail anyway, since it is based on using bcrypt at both ends... But I will keep this in mind! ThanksMarozas
@Worthington FYI, fixing the hash is required e.g. to make configuration-as-code instruments deterministic. If you use a randomized salt, every time you encrypt the same password to the same configuration file, Ansible / Puppet etc. will tell you that there was a change while fundamentally there wasn't.Gomuti
D
0

You can use crypt with blowfish. It still acccepts custom salt in 2023. Not recommended to use the same salt for password, but for identifiers e.g. email addresses it is better than nothing or a checksum algorithm.

Defant answered 12/2, 2023 at 15:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.