Human readable alternative for UUIDs

Asked 27/3, 2018 at 7:3 Answered 14/3, 2021 at 23:11

I am working on a system that makes heavy use of pseudonyms to make privacy-critical data available to researchers. These pseudonyms should have the following properties:

They should not contain any information (e.g. time of creation, relation to other pseudonyms, encoded data, …).
It should be easy to create unique pseudonyms.
They should be human readable. That means they should be easy for humans to compare, copy, and understand when read out aloud.

My first idea was to use UUID4. They are quite good on (1) and (2), but not so much on (3).

An variant is to encode UUIDs with a wider alphabet, resulting in shorter strings (see for example shortuuid). But I am not sure whether this actually improves readability.

Another approach I am currently looking into is a paper from 2005 titled "An optimal code for patient identifiers" which aims to tackle exactly my problem. The algorithm described there creates 8-character pseudonyms with 30 bits of entropy. I would prefer to use a more widely reviewed standard though.

Then there is also the git approach: only display the first few characters of the actual pseudonym. But this would mean that a pseudonym could lose its uniqueness after some time.

So my question is: Is there any widely-used standard for human-readable unique ids?

Baiss answered 27/3, 2018 at 7:3 Comment(2)

See also: https://mcmap.net/q/539706/-human-readable-guid – Cathepsin 19/6, 2020 at 18:20

Does this answer your question? Human Readable GUID – Sorrento 28/6, 2022 at 14:57

Not aware of any widely-used standard for this. Here’s a non-widely-used one:

Proquints

https://arxiv.org/html/0901.4016

https://github.com/dsw/proquint

A UUID4 (128 bit) would be converted into 8 proquints. If that’s too much, you can take the last 64 bits of the UUID4 (= just take 64 random bits). This doesn’t make it magically lose uniqueness; only increases the likelihood of collisions, which was non-zero to begin with, and which you can estimate mathematically to decide if it’s still OK for your purposes.

Antiphlogistic answered 12/5, 2018 at 11:52 Comment(0)

Here you go UUID Readable

Generate Easy to Remember, Readable UUIDs, that are Shakespearean and Grammatically Correct Sentences

Leisured answered 14/3, 2021 at 23:11 Comment(1)

This is a good one. It generates something like this if anyone wondered: Jacquette Brandt John the Pectus of Barnsdall doubted Glenn Gay Gregg and 12 noisy stoats. Is it readable? Yes. Is it 'rememberable'? No. Fun: Yup. – Puling 8/7, 2021 at 7:26

This article suggests to use the first few characters from a SHA-256 hash, similarly to what git does. UUIDs are typically based on SHA-1, so this is not all that different. The tradeoff between property (2) and (3) is in the number of characters.

With d being the number of digits, you get 2 ** (4 * d) identifiers in total, but the first collision is expected to happen after 2 ** (2 * d).

The big question is really not about the kind of identifier you use, it is how you handle collisions.

Baiss answered 4/3, 2019 at 10:55 Comment(0)

Recommended topics

Hot tags