YouTube-like GUID
Asked Answered
L

10

73

Is it possible to generate short GUID like in YouTube (N7Et6c9nL9w)?

How can it be done? I want to use it in web app.

Lebar answered 22/9, 2009 at 6:45 Comment(1)
A
76

You could use Base64:

string base64Guid = Convert.ToBase64String(Guid.NewGuid().ToByteArray());

That generates a string like E1HKfn68Pkms5zsZsvKONw==. Since a GUID is always 128 bits, you can omit the == that you know will always be present at the end and that will give you a 22 character string. This isn't as short as YouTube though.

Alga answered 22/9, 2009 at 6:51 Comment(7)
This method has the drawback that, the generated values can contain the slash (/) sign, which can, when unproperly handled, be inconvenient for ussage in Url'sTypist
While I really like this solution, I agree with Jhonny D. not only '/' can appear but also '+' which will totally break your url. sighPetrous
Just replace the '/' and '+' with URL safe chars, like '-' and '_'. Then, when reading the guid back in, replace them back before decoding.Neritic
The solution to the problem mentioned by Jhonny D. is to replace both problematic characters ('+' and '/') with url-friendly ones ('-' and '_' for example).Prole
So by replacing problematic characters ('+' and '/') with url-friendly ones ('-' and '_' for example), will still guarantee that it is a unique Guid ?Adelric
@Adelric Yes, because those characters will never appear in Base64. It's a 1:1 mapping transformation, so there will be no collision.Polychaete
if size doesn't matter then why not Guid.NewGuid().ToString("N") or just see my answer below.Splendent
P
35

URL Friendly Solution

As mentioned in the accepted answer, base64 is a good solution but it can cause issues if you want to use the GUID in a URL. This is because + and / are valid base64 characters, but have special meaning in URLs.

Luckily, there are unused characters in base64 that are URL friendly. Here is a more complete answer:

public string ToShortString(Guid guid)
{
    var base64Guid = Convert.ToBase64String(guid.ToByteArray());

    // Replace URL unfriendly characters
    base64Guid = base64Guid.Replace('+', '-').Replace('/', '_');

    // Remove the trailing ==
    return base64Guid.Substring(0, base64Guid.Length - 2);
}

public Guid FromShortString(string str)
{
    str = str.Replace('_', '/').Replace('-', '+');
    var byteArray = Convert.FromBase64String(str + "==");
    return new Guid(byteArray);
}

Usage:

Guid guid = Guid.NewGuid();
string shortStr = ToShortString(guid);
// shortStr will look something like 2LP8GcHr-EC4D__QTizUWw
Guid guid2 = FromShortString(shortStr);
Assert.AreEqual(guid, guid2);

EDIT:

Can we do better? (Theoretical limit)

The above yields a 22 character, URL friendly GUID. This is because a GUID uses 128 bits, so representing it in base64 requires log_{64}2^128 characters, which is 21.33, which rounds up to 22.

There are actually 66 URL friendly characters (we aren't using . and ~). So theoretically, we could use base66 to get log_{66}2^128 characters, which is 21.17, which also rounds up to 22.

So this is optimal for a full, valid GUID.

However, GUID uses 6 bits to indicate the version and variant, which in our case are constant. So we technically only need 122 bits, which in both bases rounds to 21 (log_{64}2^122 = 20.33). So with more manipulation, we could remove another character. This requires wrangling the bits out however, so I leave this as an exercise to the reader.

How does youtube do it?

YouTube IDs use 11 characters. How do they do it?

A GUID uses 122 bits, which guarantees collisions are virtually impossible. This means you can generate a random GUID and be certain it is unique without checking. However, we don't need so many bits for just a regular ID.

We could use a smaller ID. If we use 66 bits or less, we have a higher risk of collision, but can represent this ID with 11 characters (even in base64). One could either accept the risk of collision, or test for a collision and regenerate.

With 122 bits (regular GUID), you would have to generate ~10^17 GUIDs to have a 1% chance of collision.

With 66 bits, you would have to generate ~10^9 or 1 billion IDs to have a 1% chance of collision. That is not that many IDs.

My guess is youtube uses 64 bits (which is more memory friendly than 66 bits), and checks for collisions to regenerate the ID if necessary.

If you want to abandon GUIDs in favor of smaller IDs, here is code for that:

class IdFactory
{
    private Random random = new Random();
    public int CharacterCount { get; }
    public IdFactory(int characterCount)
    {
        CharacterCount = characterCount;
    }

    public string Generate()
    {
        // bitCount = characterCount * log (targetBase) / log(2)
        var bitCount = 6 * CharacterCount;
        var byteCount = (int)Math.Ceiling(bitCount / 8f);
        byte[] buffer = new byte[byteCount];
        random.NextBytes(buffer);

        string guid = Convert.ToBase64String(buffer);
        // Replace URL unfriendly characters
        guid = guid.Replace('+', '-').Replace('/', '_');
        // Trim characters to fit the count
        return guid.Substring(0, CharacterCount);
    }
}

Usage:

var factory = new IdFactory(characterCount: 11);
string guid = factory.Generate();
// guid will look like Mh3darwiZhp

This uses 64 characters which is not optimal, but requires much less code (since we can reuse Convert.ToBase64String). You should be a lot more careful of collisions if you use this.

Prole answered 1/12, 2016 at 17:39 Comment(4)
Since you know the length of the Base64 string is always 22 characters without the padding, couldn't you just do base64Guid.Substring(0, 22) instead of base64Guid.Substring(0, base64Guid.Length - 2)?Seepage
Yes, both options are equivalent. I think my version makes the operation a little clearer.Prole
String replacement is a fix over the existing answer then why not Guid.NewGuid().ToString("N") ?Splendent
guid.ToString("N") would return a 32 character string which is not really mini. The replace is for the base64 encode of that guid, since base64 by default uses / and + as characters and we don't want those.Prole
B
11

9 chars is not a GUID. Given that, you could use the hexadecimal representation of an int, which gives you a 8 char string.

You can use an id you might already have. Also you can use .GetHashCode against different simple types and there you have a different int. You can also xor different fields. And if you are into it, you might even use a Random number - hey, you have well above 2.000.000.000+ possible values if you stick to the positives ;)

Brouhaha answered 22/9, 2009 at 7:3 Comment(0)
S
10

It's not a GUID but a random base64 encoded string

Now what about collision ? With the large infrastructure they have and server all around the world where 1000's videos uploaded every minute it's a safe bait. Where in during the post processing of your video into different resolution the system can have this checked.

Please see the following code where I am trying for the same, It uses the TotalMilliseconds from EPOCH to generate a string with a valid set of characters and it's uniqueness is governed by each passing milliseconds. And this is based on the video from Tom Scott

The one other way is to use numeric counters but that is expensive to maintain and will create a series where you can enumerate with + or - values to guess the previous or the next unique string in the system and we don't what that to happen.

Do remember:

  • This will not be globally unique but unique to the instance where it's defined
  • It uses Thread.Sleep() to handle multithreading issue
public static string YoutubeLikeId()
{
    Thread.Sleep(1);
    string characterSet="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
    var charSet = characterSet.ToCharArray();   
    int targetBase= charSet.Length;
    long ticks = (long)DateTimeOffset.UtcNow.ToUnixTimeMilliseconds();
    
    string output = null;

    do{
        output += charSet[ticks % targetBase];
        ticks = ticks/targetBase;
    }while(ticks > 0);
    
    output = new string(output.Reverse().ToArray());
    
    return Convert.ToBase64String(Encoding.UTF8.GetBytes(output)).Replace("/", "_")
        .Replace("+", "-").Replace("==","");
}

Here we are removing the padding character == and also the non url riendly characters as / & '+" with - & _

So the output will come something like

VFlRTFk4Mw
VFlRTFk4SQ
VFlRTFk4WA
VFlRTFk4bQ
VFlRTFk5Mg
VFlRTFk5SQ
VFlRTFk5WA
VFlRTFk5bg
VFlRTFlBMw
VFlRTFlBSQ

There is also a project called ShortGuid to get a url friendly GUID it can be converted from/to a regular Guid

When I went under the hood I found it works by encoding the Guid to Base64 as the code below:

public static string Encode(Guid guid)
{
    string encoded = Convert.ToBase64String(guid.ToByteArray());

    encoded = encoded
        .Replace("/", "_")
        .Replace("+", "-");
    return encoded.Substring(0, 22);
}

The good thing about it it can be decoded again to get the Guid back with

public static Guid Decode(string value)
{
    // avoid parsing larger strings/blobs
    if (value.Length != 22)
    {
        throw new ArgumentException("A ShortGuid must be exactly 22 characters long. Receive a character string.");
    }

    string base64 = value
        .Replace("_", "/")
        .Replace("-", "+") + "==";

    byte[] blob = Convert.FromBase64String(base64);
    var guid = new Guid(blob);

    var sanityCheck = Encode(guid);
    if (sanityCheck != value)
    {
        throw new FormatException(
            @"Invalid strict ShortGuid encoded string. The string '{value}' is valid URL-safe Base64, " +
            @"but failed a round-trip test expecting '{sanityCheck}'."
        );
    }

    return guid;
}

So a Guid 4039124b-6153-4721-84dc-f56f5b057ac2 will be encoded as SxI5QFNhIUeE3PVvWwV6wg and the Output will look something like.

ANf-MxRHHky2TptaXBxcwA
zpjp-stmVE6ZCbOjbeyzew
jk7P-XYFokmqgGguk_530A
81t6YZtkikGfLglibYkDhQ
qiM2GmqCK0e8wQvOSn-zLA

And as Leonadro mentioned in the comment there is also something called nanoid for the purpose in case you don't want to implement your own.

Splendent answered 24/5, 2019 at 11:0 Comment(1)
I don't like the assumption about YouTube ids being incremental without providing evidence. Also, nanoids solve the OP problem rather nicely.Subclavius
H
7

As others have mentioned, YouTube's VideoId is not technically a GUID since it's not inherently unique.

As per Wikipedia:

The total number of unique keys is 2128 or 3.4×1038. This number is so large that the probability of the same number being generated randomly twice is negligible.

The uniqueness YouTube's VideoId is maintained by their generator algorithm.

You can either write your own algorithm, or you can use some sort of random string generator and utilize the UNIQUE CONSTRAINT constraint in SQL to enforce its uniqueness.

First, create a UNIQUE CONSTRAINT in your database:

ALTER TABLE MyTable
ADD CONSTRAINT UniqueUrlId
UNIQUE (UrlId);

Then, for example, generate a random string (from philipproplesch's answer):

string shortUrl = System.Web.Security.Membership.GeneratePassword(11, 0);

If the generated UrlId is sufficiently random and sufficiently long you should rarely encounter the exception that is thrown when SQL encounters a duplicate UrlId. In such an event, you can easily handle the exception in your web app.

Hierarchize answered 19/10, 2011 at 17:49 Comment(1)
The only problem with the method GeneratePassword is that the 2nd argument is actually for the minimum number of non-alphabetic and non-number characters. When I try with 0 I get several such symbols...Bambara
T
4

Technically it's not a Guid. Youtube has a simple randomized string generator that you can probably whip up in a few minutes using an array of allowed characters and a random number generator.

Toddtoddie answered 22/9, 2009 at 6:50 Comment(0)
S
3

It might be not the best solution, but you can do something like that:

string shortUrl = System.Web.Security.Membership.GeneratePassword(11, 0);
Scup answered 22/9, 2009 at 8:10 Comment(1)
The only problem with this method is that the 2nd argument is actually for the minimum number of non-alphabetic and non-number characters. When I try with 0 I get several such symbols...Bambara
E
2

This id is probably not globally unique. GUID's should be globally unique as they include elements which should not occur elsewhere (the MAC address of the machine generating the ID, the time the ID was generated, etc.)

If what you need is an ID that is unique within your application, use a number fountain - perhaps encoding the value as a hexadecimal number. Every time you need an id, grab it from the number fountain.

If you have multiple servers allocating id's, you could grab a range of numbers (a few tens or thousands depending on how quickly you're allocating ids) and that should do the job. an 8 digit hex number will give you 4 billion ids - but your first id's will be a lot shorter.

Eisegesis answered 22/9, 2009 at 6:55 Comment(1)
Sorry, for the necro, but what's a number fountain? Couldn't find a definition online. Is it just an integer that's atomically incremented each time a new id is requested or is there some deeper logic to it?Sunbow
A
2

Maybe using NanoId will save you from a lot of headaches: https://github.com/codeyu/nanoid-net

You can do something like:

var id = Nanoid.Generate('1234567890abcdef', 10) //=> "4f90d13a42"

And you can check the collision probability here: https://alex7kom.github.io/nano-nanoid-cc/

Adaptation answered 22/9, 2022 at 18:31 Comment(0)
P
0

This answer is based on the one from Gilthans, the code minimizes heap allocations.

ToShortString allocates only 72 bytes when chars.ToString() is executed:
26 + stringLength*2 + bytes_to_make_the_result_divisible_by_8
(see this Jon Skeet's article).

FromShortString is allocation-free.

These methods, just like in the original solution, produce/consume url-friendly strings with a length of 22 characters which have the same collision chance as Guids.

public static string ToShortString(Guid guid)
{
    Span<byte> bytes = stackalloc byte[16];
    guid.TryWriteBytes(bytes);

    Span<char> chars = stackalloc char[24];
    Convert.TryToBase64Chars(bytes, chars, out _);

    chars = chars[..22];
    chars.Replace('+', '-');
    chars.Replace('/', '_');
    return chars.ToString();
}

public static Guid FromShortString(string value)
{
    Span<char> chars = stackalloc char[24];
    value.CopyTo(chars);
    chars.Replace('_', '/');
    chars.Replace('-', '+');
    "==".CopyTo(chars[22..]);

    Span<byte> bytes = stackalloc byte[16];
    Convert.TryFromBase64Chars(chars, bytes, out _);

    return new Guid(bytes);
}

It’s important to recognize that the reduced heap allocation does not imply that these methods perform faster than the original ones.

BenchmarkDotNet test results:

Method Mean Error StdDev Allocated
ToShortStringOriginal 26.18 ns 0.195 ns 0.173 ns 184 B
ToShortStringMinAllocations 39.35 ns 0.132 ns 0.124 ns 72 B
FromShortStringOriginal 36.20 ns 0.117 ns 0.104 ns 112 B
FromShortStringNoAllocations 37.82 ns 0.100 ns 0.093 ns -
Pridemore answered 19/4 at 15:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.