Creating a UUID from a string with no dashes
Asked Answered
M

11

67

How would I create a java.util.UUID from a string with no dashes?

"5231b533ba17478798a3f2df37de2aD7" => #uuid "5231b533-ba17-4787-98a3-f2df37de2aD7"
Mania answered 24/9, 2013 at 16:8 Comment(4)
Depending on where you put those dashes a new UUID will be created. How do you decide?Herson
What approach are you using right now? Why are you concerned about it?Shoreward
I could add the four dashes in and call the UUID constructor, but I'm not sure if they always follow the same format. Do java.util.UUIDs follow a specific format?Mania
They are a specific format of 8-4-4-4-12 hex digits.Homesick
H
19

Clojure's #uuid tagged literal is a pass-through to java.util.UUID/fromString. And, fromString splits it by the "-" and converts it into two Long values. (The format for UUID is standardized to 8-4-4-4-12 hex digits, but the "-" are really only there for validation and visual identification.)

The straight forward solution is to reinsert the "-" and use java.util.UUID/fromString.

(defn uuid-from-string [data]
  (java.util.UUID/fromString
   (clojure.string/replace data
                           #"(\w{8})(\w{4})(\w{4})(\w{4})(\w{12})"
                           "$1-$2-$3-$4-$5")))

If you want something without regular expressions, you can use a ByteBuffer and DatatypeConverter.

(defn uuid-from-string [data]
  (let [buffer (java.nio.ByteBuffer/wrap 
                 (javax.xml.bind.DatatypeConverter/parseHexBinary data))]
    (java.util.UUID. (.getLong buffer) (.getLong buffer))))
Homesick answered 24/9, 2013 at 21:58 Comment(1)
I think the faster regexp will be: (.{8})(.{4})(.{4})(.{4})(.{12}). The . means "any character". This way a regexp parser don't need to check each character, if it belongs to the "word" character's group.Tiling
A
68

tl;dr

java.util.UUID.fromString(
    "5231b533ba17478798a3f2df37de2aD7"
    .replaceFirst( 
        "(\\p{XDigit}{8})(\\p{XDigit}{4})(\\p{XDigit}{4})(\\p{XDigit}{4})(\\p{XDigit}+)", "$1-$2-$3-$4-$5" 
    )
).toString()

5231b533-ba17-4787-98a3-f2df37de2ad7

Or parse each half of the hexadecimal string as long integer numbers, and pass to constructor of UUID.

UUID uuid = new UUID ( long1 , long2 ) ; 

Bits, Not Text

A UUID is a 128-bit value. A UUID is not actually made up of letters and digits, it is made up of bits. You can think of it as describing a very, very large number.

We could display those bits as a one hundred and twenty eight 0 & 1 characters.

0111 0100 1101 0010 0101 0001 0101 0110 0110 0000 1110 0110 0100 0100 0100 1100 1010 0001 0111 0111 1010 1001 0110 1110 0110 0111 1110 1100 1111 1100 0101 1111

Humans do not easily read bits, so for convenience we usually represent the 128-bit value as a hexadecimal string made up of letters and digits.

74d25156-60e6-444c-a177-a96e67ecfc5f

Such a hex string is not the UUID itself, only a human-friendly representation. The hyphens are added per the UUID spec as canonical formatting, but are optional.

74d2515660e6444ca177a96e67ecfc5f

By the way, the UUID spec clearly states that lowercase letters must be used when generating the hex string while uppercase should be tolerated as input. Unfortunately, many implementations violate that lowercase-generation rule, including those from Apple, Microsoft, and others. See my blog post.


The following refers to Java, not Clojure.

In Java 7 (and earlier), you may use the java.util.UUID class to instantiate a UUID based on a hex string with hyphens as input. Example:

java.util.UUID uuidFromHyphens = java.util.UUID.fromString("6f34f25e-0b0d-4426-8ece-a8b3f27f4b63");
System.out.println( "UUID from string with hyphens: " + uuidFromHyphens );

However, that UUID class fails with inputting a hex string without hyphens. This failure is unfortunate as the UUID spec does not require the hyphens in a hex string representation. This fails:

java.util.UUID uuidFromNoHyphens = java.util.UUID.fromString("6f34f25e0b0d44268ecea8b3f27f4b63");

Regex

One workaround is to format the hex string to add the canonical hyphens. Here's my attempt at using regex to format the hex string. Beware… This code works, but I'm no regex expert. You should make this code more robust, say checking that the length of the string is 32 characters before formatting and 36 after.

    // -----|  With Hyphens  |----------------------
java.util.UUID uuidFromHyphens = java.util.UUID.fromString( "6f34f25e-0b0d-4426-8ece-a8b3f27f4b63" );
System.out.println( "UUID from string with hyphens: " + uuidFromHyphens );
System.out.println();

// -----|  Without Hyphens  |----------------------
String hexStringWithoutHyphens = "6f34f25e0b0d44268ecea8b3f27f4b63";
// Use regex to format the hex string by inserting hyphens in the canonical format: 8-4-4-4-12
String hexStringWithInsertedHyphens =  hexStringWithoutHyphens.replaceFirst( "([0-9a-fA-F]{8})([0-9a-fA-F]{4})([0-9a-fA-F]{4})([0-9a-fA-F]{4})([0-9a-fA-F]+)", "$1-$2-$3-$4-$5" );
System.out.println( "hexStringWithInsertedHyphens: " + hexStringWithInsertedHyphens );
java.util.UUID myUuid = java.util.UUID.fromString( hexStringWithInsertedHyphens );
System.out.println( "myUuid: " + myUuid );

Posix Notation

You might find this alternative syntax more readable, using Posix notation within the regex where \\p{XDigit} takes the place of [0-9a-fA-F] (see Pattern doc):

String hexStringWithInsertedHyphens =  hexStringWithoutHyphens.replaceFirst( "(\\p{XDigit}{8})(\\p{XDigit}{4})(\\p{XDigit}{4})(\\p{XDigit}{4})(\\p{XDigit}+)", "$1-$2-$3-$4-$5" );

Complete example.

java.util.UUID uuid =
        java.util.UUID.fromString (
                "5231b533ba17478798a3f2df37de2aD7"
                        .replaceFirst (
                                "(\\p{XDigit}{8})(\\p{XDigit}{4})(\\p{XDigit}{4})(\\p{XDigit}{4})(\\p{XDigit}+)",
                                "$1-$2-$3-$4-$5"
                        )
        );

System.out.println ( "uuid.toString(): " + uuid );

uuid.toString(): 5231b533-ba17-4787-98a3-f2df37de2ad7

Antepenult answered 16/10, 2013 at 9:30 Comment(0)
H
19

Clojure's #uuid tagged literal is a pass-through to java.util.UUID/fromString. And, fromString splits it by the "-" and converts it into two Long values. (The format for UUID is standardized to 8-4-4-4-12 hex digits, but the "-" are really only there for validation and visual identification.)

The straight forward solution is to reinsert the "-" and use java.util.UUID/fromString.

(defn uuid-from-string [data]
  (java.util.UUID/fromString
   (clojure.string/replace data
                           #"(\w{8})(\w{4})(\w{4})(\w{4})(\w{12})"
                           "$1-$2-$3-$4-$5")))

If you want something without regular expressions, you can use a ByteBuffer and DatatypeConverter.

(defn uuid-from-string [data]
  (let [buffer (java.nio.ByteBuffer/wrap 
                 (javax.xml.bind.DatatypeConverter/parseHexBinary data))]
    (java.util.UUID. (.getLong buffer) (.getLong buffer))))
Homesick answered 24/9, 2013 at 21:58 Comment(1)
I think the faster regexp will be: (.{8})(.{4})(.{4})(.{4})(.{12}). The . means "any character". This way a regexp parser don't need to check each character, if it belongs to the "word" character's group.Tiling
C
18

Regexp solution is probably faster, but you can also look at that :)

String withoutDashes = "44e128a5-ac7a-4c9a-be4c-224b6bf81b20".replaceAll("-", "");      
BigInteger bi1 = new BigInteger(withoutDashes.substring(0, 16), 16);                
BigInteger bi2 = new BigInteger(withoutDashes.substring(16, 32), 16);
UUID uuid = new UUID(bi1.longValue(), bi2.longValue());
String withDashes = uuid.toString();

By the way, conversion from 16 binary bytes to uuid

  InputStream is = ..binarty input..;
  byte[] bytes = IOUtils.toByteArray(is);
  ByteBuffer bb = ByteBuffer.wrap(bytes);
  UUID uuidWithDashesObj = new UUID(bb.getLong(), bb.getLong());
  String uuidWithDashes = uuidWithDashesObj.toString();
Carryingon answered 10/6, 2015 at 15:14 Comment(1)
This should be the correct answer for Java. The other solutions propose adding dashes using Regex and then calling fromString, but that just uses the dashes to split the uuid. Makes more sense to directly split it yourself.Divorce
S
13

You could do a goofy regular expression replacement:

String digits = "5231b533ba17478798a3f2df37de2aD7";                         
String uuid = digits.replaceAll(                                            
    "(\\w{8})(\\w{4})(\\w{4})(\\w{4})(\\w{12})",                            
    "$1-$2-$3-$4-$5");                                                      
System.out.println(uuid); // => 5231b533-ba17-4787-98a3-f2df37de2aD7
Shoreward answered 24/9, 2013 at 16:44 Comment(5)
Please keep in mind that using a compiled regex pattern will give a huge performance benefit over continuous calls. See: String replaceAll() vs. Matcher replaceAll() (Performance differences) for details.Cornwell
@vahapt: yes, of course, but my intent was to simply demonstrate the strategy rather than obscure it with tangentially related things like performance. Consider also that if a program does this activity only once then precompiling the regex will have no effect whatsoever.Shoreward
You are absolutely right, usage of compiled regex is required only for people who need high performance and do continuous calls.Cornwell
@Cornwell Please, could you either edit this answer, or post your own to show how to do the same with compiled regex pattern?Noranorah
@LouisCAD I've created a new answer here https://mcmap.net/q/293806/-creating-a-uuid-from-a-string-with-no-dashesCornwell
P
9

A much (~ 900%) faster solution compared to using regexps and string manipulation is to just parse the hex string into 2 longs and create the UUID instance from those:

(defn uuid-from-string
  "Converts a 32digit hex string into java.util.UUID"
  [hex]
  (java.util.UUID.
    (Long/parseUnsignedLong (subs hex 0 16) 16)
    (Long/parseUnsignedLong (subs hex 16) 16)))
Pissarro answered 15/9, 2015 at 19:34 Comment(3)
This seems wrong. I don't know clojure that well but parseUnsignedLong doesn't parse hex.Myalgia
I got confused by the closure syntax and didn't realize it was passing a radix argument.Myalgia
If you want just a barely slightly faster approach (with also less GC) see my answer: https://mcmap.net/q/293806/-creating-a-uuid-from-a-string-with-no-dashesMyalgia
P
8
public static String addUUIDDashes(String idNoDashes) {
    StringBuffer idBuff = new StringBuffer(idNoDashes);
    idBuff.insert(20, '-');
    idBuff.insert(16, '-');
    idBuff.insert(12, '-');
    idBuff.insert(8, '-');
    return idBuff.toString();
}

Maybe someone else can comment on the computational efficiency of this approach. (It wasn't a concern for my application.)

Pertinacity answered 15/9, 2015 at 18:1 Comment(0)
C
6

Optimized version of @maerics 's answer:

    String[] digitsList= {
            "daa70a7ffa904841bf9a81a67bdfdb45",
            "529737c950e6428f80c0bac104668b54",
            "5673c26e2e8f4c129906c74ec634b807",
            "dd5a5ee3a3c44e4fb53d2e947eceeda5",
            "faacc25d264d4e9498ade7a994dc612e",
            "9a1d322dc70349c996dc1d5b76b44a0a",
            "5fcfa683af5148a99c1bd900f57ea69c",
            "fd9eae8272394dfd8fd42d2bc2933579",
            "4b14d571dd4a4c9690796da318fc0c3a",
            "d0c88286f24147f4a5d38e6198ee2d18"
    };

    //Use compiled pattern to improve performance of bulk operations
    Pattern pattern = Pattern.compile("(\\w{8})(\\w{4})(\\w{4})(\\w{4})(\\w{12})");

    for (int i = 0; i < digitsList.length; i++)
    {
        String uuid = pattern.matcher(digitsList[i]).replaceAll("$1-$2-$3-$4-$5");
        System.out.println(uuid);
    }
Cornwell answered 11/11, 2017 at 12:48 Comment(0)
H
3

Another solution would be something similar to Pawel's solution but without creating new Strings and only solving the questions problem. If perfomance is a concern, avoid regex/split/replaceAll and UUID.fromString like the plague.

String hyphenlessUuid = in.nextString();
BigInteger bigInteger = new BigInteger(hyphenlessUuid, 16);
 new UUID(bigInteger.shiftRight(64).longValue(), bigInteger.longValue());
Holmes answered 4/12, 2015 at 0:11 Comment(0)
H
3

Here is an example that is faster because it doesn't use regexp.

public class Example1 {
    /**
     * Get a UUID from a 32 char hexadecimal.
     * 
     * @param string a hexadecimal string
     * @return a UUID
     */
    public static UUID toUuid(String string) {

        if (string == null || string.length() != 32) {
            throw new IllegalArgumentException("invalid input string!");
        }

        char[] input = string.toCharArray();
        char[] output = new char[36];

        System.arraycopy(input, 0, output, 0, 8);
        System.arraycopy(input, 8, output, 9, 4);
        System.arraycopy(input, 12, output, 14, 4);
        System.arraycopy(input, 16, output, 19, 4);
        System.arraycopy(input, 20, output, 24, 12);

        output[8] = '-';
        output[13] = '-';
        output[18] = '-';
        output[23] = '-';

        return UUID.fromString(output)
    }

    public static void main(String[] args) {
        UUID uuid = toUuid("daa70a7ffa904841bf9a81a67bdfdb45");
    }
}

There's a codec in uuid-creator that can do it more efficiently: Base16Codec. Example:

// Parses base16 strings with 32 chars (case insensitive)
UuidCodec<String> codec = new Base16Codec();
UUID uuid = codec.decode("0123456789AB4DEFA123456789ABCDEF");
Huelva answered 21/7, 2020 at 23:59 Comment(0)
M
2

I believe the following is the fastest in terms of performance. It is even slightly faster than Long.parseUnsignedLong version . It is slightly altered code that comes from java-uuid-generator.

 public static UUID from32(
        String id) {
    if (id == null) {
        throw new NullPointerException();
    }
    if (id.length() != 32) {
        throw new NumberFormatException("UUID has to be 32 char with no hyphens");
    }

    long lo, hi;
    lo = hi = 0;

    for (int i = 0, j = 0; i < 32; ++j) {
        int curr;
        char c = id.charAt(i);

        if (c >= '0' && c <= '9') {
            curr = (c - '0');
        }
        else if (c >= 'a' && c <= 'f') {
            curr = (c - 'a' + 10);
        }
        else if (c >= 'A' && c <= 'F') {
            curr = (c - 'A' + 10);
        }
        else {
            throw new NumberFormatException(
                    "Non-hex character at #" + i + ": '" + c + "' (value 0x" + Integer.toHexString(c) + ")");
        }
        curr = (curr << 4);

        c = id.charAt(++i);

        if (c >= '0' && c <= '9') {
            curr |= (c - '0');
        }
        else if (c >= 'a' && c <= 'f') {
            curr |= (c - 'a' + 10);
        }
        else if (c >= 'A' && c <= 'F') {
            curr |= (c - 'A' + 10);
        }
        else {
            throw new NumberFormatException(
                    "Non-hex character at #" + i + ": '" + c + "' (value 0x" + Integer.toHexString(c) + ")");
        }
        if (j < 8) {
            hi = (hi << 8) | curr;
        }
        else {
            lo = (lo << 8) | curr;
        }
        ++i;
    }
    return new UUID(hi, lo);
}
Myalgia answered 22/5, 2017 at 14:15 Comment(0)
A
-1

Java 17 makes this task easier, because it provides the java.util.HexFormat class:

(import (java.util UUID HexFormat))

(def uuid "5231b533ba17478798a3f2df37de2aD7")
(UUID. (HexFormat/fromHexDigitsToLong uuid 0 16)
       (HexFormat/fromHexDigitsToLong uuid 16 32))
=> #uuid "5231b533-ba17-4787-98a3-f2df37de2ad7"

Of course, that assumes you know that the uuid string is in the right format. If not, you can guard this with a sanity check by using the predicate

(re-matches #"\p{XDigit}{32}" uuid)

But that's a simple regex which doesn't require any capturing groups, so it should be quick.

Appal answered 30/3, 2024 at 9:7 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.