How do I properly implement Unicode passwords?

Asked 9/5, 2010 at 19:3 Answered 10/5, 2010 at 17:40

Solved unicode passwords normalization unicode-normalization text-normalization

Adding support for Unicode passwords it an important feature that should not be ignored by developers.

Still, adding support for Unicode in passwords is a tricky job because the same text can be encoded in different ways in Unicode and you don't want to prevent people from logging in because of this.

Let's say that you'll store the passwords as UTF-8, and mind that this question is not related to Unicode encodings and it's related to Unicode normalization.

Now the question is how you should normalize the Unicode data?

You have to be sure that you'll be able to compare it. You need to be sure that when the next Unicode standard will be released it will not invalidate your password verification.

Note: still there are some places where Unicode passwords will probably never be used, but this question is not about why or when to use Unicode passwords, it is about how to implement them in the proper way.

1st update

Is it possible to implement this without using ICU, like using OS for normalizing?

Natie answered 9/5, 2010 at 19:3 Comment(5)

What difference does it make when another unicode standard is released? You've made the decision to store the password in UTF-8 - so store the password in UTF-8. Committees can release new standards without you being forced to change the way you store your data. – Ms 9/5, 2010 at 19:7

Unicode does not dictate encoding. It's just a list of characters that each has a number associated with it (basically). If you choose UTF-8 I don't how this encoding can change in the future in a way that breaks compatibility. – Freddafreddi 9/5, 2010 at 19:38

There are multiple ways of encoding the same visual characters, I am assuming that this is what he wants to know how to cope with. – Marchioness 9/5, 2010 at 19:48

Maybe I wasn't clear enough, this is not about Unicode encodings, it's about normalization of Unicode text, process that is required in order to be able to compare the strings. I modified the question to clarify this. – Natie 9/5, 2010 at 20:54

A very similar question (“What Unicode normalization (and other processing) is appropriate for passwords when hashing?”) has an in-depth answer that is current as of November 2022, citing now-controlling RFCs 8264 and 8265 about the PRECIS framework, which obsoletes StringPrep. – Testator 20/11, 2022 at 2:59

A good start is to read Unicode TR 15: Unicode Normalization Forms. Then you realize that it is a lot of work and prone to strange errors - you probably already know this part since you are asking here. Finally, you download something like ICU and let it do it for you.

IIRC, it is a multistep process. First you decompose the sequence until you cannot further decompose - for example é would become e + ´. Then you reorder the sequences into a well-defined ordering. Finally, you can encode the resulting byte stream using UTF-8 or something similar. The UTF-8 byte stream can be fed into the cryptographic hash algorithm of your choice and stored in a persistent store. When you want to check if a password matches, perform the same procedure and compare the output of the hash algorithm with what is stored in the database.

Malaguena answered 9/5, 2010 at 19:34 Comment(0)

A question back to you- can you explain why you added "without using ICU"? I see a lot of questions asking for things that ICU does (we* think) pretty well, but "without using ICU". Just curious.

Secondly, you may be interested in StringPrep/NamePrep and not just normalization: StringPrep - to map strings for comparison.

Thirdly, you may be intererested in UTR#36 and UTR#39 for other Unicode security implications.

*(disclosure: ICU developer :)

Paronymous answered 10/5, 2010 at 17:40 Comment(2)

I have nothing against ICU but in some cases its size can be a real issue. For this reason you may want to use OS specific API. – Natie 12/5, 2010 at 14:31

If you're only using normalization, you can trim down the size pretty easily (both of code and data). Also, ICU is often installed as a module. Thank you for the response. – Paronymous 12/5, 2010 at 19:51

1st update

Recommended topics

Hot tags