Generating UUID based on strings
Asked Answered
G

2

6

How to generate deterministic GUID/UUIDs v3/v5 in C# having both namespace and name as strings (according to RFC4122, you need to provide namespace as GUID and name as string) provided to function, so i would like to provide two strings instead guid for namespace and string for name and have always same GUID/UUID for string for namespace and for string for name. Is hashing namespace string with MD5/SHA1 and making new Guid by Guid(byte[]) constructor a safe way to acomplish this, so i could further provide it to function ? I am NOT asking about parsing guid-a-like string to namespace by Guid.TryParse(), but converting any string to guid namespace to further provide it for below function, but having it deterministic as well. According to https://github.com/Faithlife/FaithlifeUtility/blob/master/src/Faithlife.Utility/GuidUtility.cs and RFC 4122 this is how you should create GUID given the GUID namespace and string name/any string.

        /// <summary>
    /// Creates a name-based UUID using the algorithm from RFC 4122 §4.3.
    /// </summary>
    /// <param name="namespaceId">The ID of the namespace.</param>
    /// <param name="nameBytes">The name (within that namespace).</param>
    /// <param name="version">The version number of the UUID to create; this value must be either
    /// 3 (for MD5 hashing) or 5 (for SHA-1 hashing).</param>
    /// <returns>A UUID derived from the namespace and name.</returns>
    public static Guid Create(Guid namespaceId, byte[] nameBytes, int version)
    {
        if (version != 3 && version != 5)
            throw new ArgumentOutOfRangeException(nameof(version), "version must be either 3 or 5.");

        // convert the namespace UUID to network order (step 3)
        byte[] namespaceBytes = namespaceId.ToByteArray();
        SwapByteOrder(namespaceBytes);

        // compute the hash of the namespace ID concatenated with the name (step 4)
        byte[] data = namespaceBytes.Concat(nameBytes).ToArray();
        byte[] hash;
        using (var algorithm = version == 3 ? (HashAlgorithm) MD5.Create() : SHA1.Create())
            hash = algorithm.ComputeHash(data);

        // most bytes from the hash are copied straight to the bytes of the new GUID (steps 5-7, 9, 11-12)
        byte[] newGuid = new byte[16];
        Array.Copy(hash, 0, newGuid, 0, 16);

        // set the four most significant bits (bits 12 through 15) of the time_hi_and_version field to the appropriate 4-bit version number from Section 4.1.3 (step 8)
        newGuid[6] = (byte) ((newGuid[6] & 0x0F) | (version << 4));

        // set the two most significant bits (bits 6 and 7) of the clock_seq_hi_and_reserved to zero and one, respectively (step 10)
        newGuid[8] = (byte) ((newGuid[8] & 0x3F) | 0x80);

        // convert the resulting UUID to local byte order (step 13)
        SwapByteOrder(newGuid);
        return new Guid(newGuid);
    }
Guizot answered 17/4, 2020 at 8:31 Comment(1)
This QA should have enough discussion to answer all your concerns: https://mcmap.net/q/22029/-how-to-create-deterministic-guidsZealot
P
0

No, what you propose is not valid because it fundamentally breaks how UUIDs work. Use a real UUID for your namespace.

A convenient (and valid) way to accomplish this is hierarchical namespaces. First, use the standard DNS namespace UUID plus your domain name to generate your root namespace:

Guid nsDNS = new Guid("6ba7b810-9dad-11d1-80b4-00c04fd430c8"); Guid nsRoot = Guid.Create(nsDNS, "myapp.example.com", 5);

Then create a namespace UUID for your string:

Guid nsFoo = Guid.Create(nsRoot, "Foo", 5);

Now you're ready to use your new Foo namespace UUID with individual names:

Guid bar = Guid.Create(nsFoo, "Bar", 5);

The benefit of this is that anyone else will get completely different UUIDs than you, even if their strings (other than the domain, obviously) are identical to yours, preventing collisions if your data sets are ever merged, yet it's completely deterministic, logical and self-documenting.

(Note: I've never actually used C#, so if I got the syntax slightly wrong, feel free to edit. I think the pattern is clear regardless.)

Propend answered 25/5, 2020 at 20:42 Comment(1)
Would be nice, unfortunately Guid.Create does not exist.Aryl
F
0

The answer to this question ultimately depends on what your relation to the specific namespace is, but let's start with the basics first.

The deterministic UUID must be defined in terms of a namespace UUID and a name string; that is something mandated by the standard. However, the terms "namespace" and "name" don't necessarily have to map to concrete namespaces and names used in your code. As an example, the type System.Guid in C# can be thought of as having the System namespace and Guid name, but actually identifying the "name space" of all C# type identifiers as a UUID and using System.Guid as the name is also fine (and perhaps better). Similarly, ISBNs can be identified using the urn:isnb: URI prefix, but why not treat the space of all URIs as one large namespace, when the UUID for that is already standardized?

In this regard, the namespace part could be easily thought of as a format, something that unambiguously defines how to interpret (or produce) whatever comes after it. Importantly, the resulting UUID could also be used as a namespace on its own, as it is as valid as any other UUID.

So how to decide what namespace to use? There are several options:

  • If your UUIDs are produced and consumed in a generally closed system, there is nothing wrong with picking a random (v4) UUID for the UUID namespace and just concatenating your namespace and name in some fashion to produce the UUID name. You can always tell the namespace UUID to anyone who would want to use it as well.

  • If you want others to be able to find a UUID for your objects without prior communication, you can pick one of the "well-known" namespaces, that is DNS, URL, OID or X.500, but be aware that this (obviously) restricts what can be identified to what can be represented in those namespaces. For the case of URI, this is already rich enough to identify a lot of things, and (for linked data considerations) you could use your own URI pattern, such as http://example.org/users/1 to be the "true" identifier of the resource (your namespace and name could be transformed into that).

  • If your entity is not directly representable in one of the above namespaces, you can still try and think of a "reasonable" way to devise a hierarchy in order to reach it. In theory, as an example, you can use something like http://www.w3.org/2001/XMLSchema#gYear to represent the namespace for all years in the Gregorian calendar, turning it into c108fbcf-4357-57cd-a8c0-8799e467e87f. It is reasonable to assume the format of a name in such a namespace corresponds to the lexical space of the gYear datatype, thus concatenating it with 2021 (yielding abe9231c-2deb-5ae3-a23e-77c2f4657e04) would be a reasonable way to identify the current year.

    In practice, not many people care about being able to represent an entity in this way, but there is still a chance greater than 0 that some would think about this method (just by the sole fact that you read this answer right now).

  • If you are feeling adventurous, you may think about using the nil UUID (00000000-0000-0000-0000-000000000000) as the namespace, using the two-step approach to first treat your namespace as a UUID name in the first step, then using that with your name to get the final UUID in the second step (something that could be repeated for a tuple of any length). This however blatantly defies the purpose of UUIDs themselves, as the most reasonable things UUIDs generated from these tuples could represent are the tuples themselves, at which point you might as well stop using UUIDs altogether and just hash your namespace and name.

Faden answered 22/11, 2021 at 16:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.