Simple proof that GUID is not unique [closed]
Asked Answered
C

30

323

I'd like to prove that a GUID is not unique in a simple test program. I expected the following code to run for hours, but it's not working. How can I make it work?

BigInteger begin = new BigInteger((long)0);
BigInteger end = new BigInteger("340282366920938463463374607431768211456",10);  //2^128
for(begin; begin<end; begin++)
  Console.WriteLine(System.Guid.NewGuid().ToString());

I'm using C#.

Copolymerize answered 10/11, 2009 at 0:55 Comment(29)
As a software developer, what would you say if a user came to you and said "it's not working"?Workable
Wait several trillion years.Petronel
BigInteger is a Java construct. What are you using?Traceable
emm... I'm guessing GUID has some kind of time stamp in it. So perhaps you can try to have many threads running at the same time generaging GUID and save them all to a database with the GUID as the primary key.... and wait for it to fail?Metabolite
serveral threads is a good ideaCopolymerize
Upmodded because this is the most amusing thing I've seen online today.Darceydarci
@Darceydarci - lol. I'm having troubling finding anything about this question that isn't fundamentally wrong. The longer I look at it, the funnier it gets.Fissi
You forgot to take the protons in your CPU decaying into account.Abloom
It's only globally unique, so it's only unique on our planet. If you want a truly unique id you need to use a universally unique id (UUID). I assume that you're only interested in uniqueness within our universe. :-)Slattern
The more I read Kai's comments, the more I think this is a troll.Retral
Um, what happened to "No question is too trivial or too "newbie"."? We don't have to treat him bad just because we think his question is dumb!Disintegration
Kai asked why his loop wasn't iterating in a comment, and from that i surmised his real question. Please answer that, instead of the question "why does genertaing guids take longer than i expect".Disintegration
RCIX, I don't believe your edit captures Kai's intent. Any way I read it (and read his comments to the answers below), it's clear Kai's expecting to find duplicate GUIDs during a short run of the loop he posted.Another
That's beside the point. If you look at his comments to nathan taylor's answer below, you see he says he can't iterate biginteger.Disintegration
@Disintegration - but, if you look at his comments to rjmunro, it's clear he's also and primarily asking how to make this go faster. It's fundamentally flawed in that the hardware that is available to us is incapable of doing this in any reasonable period of time. He needs to step back and realize that what he wants is not possible. Let's also not forget the fact that he's simply dumping all these GUIDs to the console. Is he going to hand compare them all?Jarvey
First time i loled on this site.Buckels
My point was merely that we should try our best to understand what he's really asking, and that we need not point out asides to a problem.Disintegration
That said, i was transferring my feelings of upsettedness from sonewhere else and i am truly sorry for saying you guys were flaming him.Disintegration
This is a real question. There's a lot for him to learn from some of the responses below such as #1705508 and #1705508 (full disclosure: I also have posted an answer to this question). This question is about programming and no question is too trivial or too "newbie".Intercellular
Hmm, seems correct, it should print 42 after a while...Jeffryjeffy
Even though people are bashing this guy for the question. This question actually generated some interesting answers. Let alone the amusement it has come of it. =DTravancore
Shouldn't that be 340282366920938463463374607431768211457?Investiture
@Nathan Taylor: BigInteger was added in .NET 4 (msdn.microsoft.com/en-us/library/…); however, this code uses a non-existent constructor overload (string, int) so it will not compile.Hanks
People, it's obviously taking too long because C# is such a slow language. He should use C.Cermet
"In theory, theory and practice are the same. In practice, they are not." -- Lawrence Peter BerraAphanite
I remember when, I had to let a program run for days on my 286 IBM-compatible computer. I have to smile because Kai is complaining that his program is not finishing within "hours".Bohs
lol @ptomato. being a C# programmer that still made me laugh.Logorrhea
Simple program that will take longer: Guid test = Guid.NewGuid(); while (true) if test.Equals(Guid.NewGuid()) throw new Exception("Duplicate found!");Sixpenny
I am disappointed by the fact that most of the responses to this question have been harassing Kai for his simple brute force attempt to prove that one GUID could conceivably match another. Statistical unlikeliness aside, I see no one denying the possibility. I believe the spirit of his question was, how could he prove his hypothesis.Alaska
H
407

Kai, I have provided a program that will do what you want using threads. It is licensed under the following terms: you must pay me $0.0001 per hour per CPU core you run it on. Fees are payable at the end of each calendar month. Please contact me for my paypal account details at your earliest convenience.

using System;
using System.Collections.Generic;
using System.Linq;

namespace GuidCollisionDetector
{
    class Program
    {
        static void Main(string[] args)
        {
            //var reserveSomeRam = new byte[1024 * 1024 * 100];     // This indeed has no effect.

            Console.WriteLine("{0:u} - Building a bigHeapOGuids.", DateTime.Now);
            // Fill up memory with guids.
            var bigHeapOGuids = new HashSet<Guid>();
            try
            {
                do
                {
                    bigHeapOGuids.Add(Guid.NewGuid());
                } while (true);
            }
            catch (OutOfMemoryException)
            {
                // Release the ram we allocated up front.
                // Actually, these are pointless too.
                //GC.KeepAlive(reserveSomeRam);
                //GC.Collect();
            }
            Console.WriteLine("{0:u} - Built bigHeapOGuids, contains {1} of them.", DateTime.Now, bigHeapOGuids.LongCount());


            // Spool up some threads to keep checking if there's a match.
            // Keep running until the heat death of the universe.
            for (long k = 0; k < Int64.MaxValue; k++)
            {
                for (long j = 0; j < Int64.MaxValue; j++)
                {
                    Console.WriteLine("{0:u} - Looking for collisions with {1} thread(s)....", DateTime.Now, Environment.ProcessorCount);
                    System.Threading.Tasks.Parallel.For(0, Int32.MaxValue, (i) =>
                    {
                        if (bigHeapOGuids.Contains(Guid.NewGuid()))
                            throw new ApplicationException("Guids collided! Oh my gosh!");
                    }
                    );
                    Console.WriteLine("{0:u} - That was another {1} attempts without a collision.", DateTime.Now, ((long)Int32.MaxValue) * Environment.ProcessorCount);
                }
            }
            Console.WriteLine("Umm... why hasn't the universe ended yet?");
        }
    }
}

PS: I wanted to try out the Parallel extensions library. That was easy.

And using OutOfMemoryException as control flow just feels wrong.

EDIT

Well, it seems this still attracts votes. So I've fixed the GC.KeepAlive() issue. And changed it to run with C# 4.

And to clarify my support terms: support is only available on the 28/Feb/2010. Please use a time machine to make support requests on that day only.

EDIT 2 As always, the GC does a better job than I do at managing memory; any previous attempts at doing it myself were doomed to failure.

Hussein answered 10/11, 2009 at 0:55 Comment(21)
That last Console.WriteLine made me laugh real hard. I think you should throw a CommonlyAcceptedCosmologicTheoriesWrongException instead.Gamekeeper
does marking this as Accepted also means that @Copolymerize accepts the terms stipulated by @ligos?Victim
Setting reserveSomeRam = null; doesn't actually accomplish anything.Underset
@devinb please explain? it looks like it is freeing the bytes it was previously allocated so that the GC can Collect() it. Why doesn't it accomplish anything?Expendable
@ligos: You need to put a GC.KeepAlive(reserveSomeRam) somewhere--probably after the try/catch--in order to prevent collection of the array. Hopefully you're offering eternal customer support with your licensing terms...Stave
@ligos: a few comments: code you put in SO answers must be public domain; I believe you mean "universe" and not "univese"; I ran your program in the future; I ran it in reverse time so you'd have to pay me money; surprisingly it didn't find a collision and the reason that the universe has not ended has something to do with highly charged recursive damma fluctuation cores.Femininity
@mythz: The JIT will detect that the array referenced by reserveSomeRam is never used (i.e., the value in reserveSomeRam is never read), so it will be GCed long before the variable is nulled out. As @Curt Nichols mentions, a call to GC.KeepAlive would need to be added to prevent this.Hanks
@BradleyGrainger I see it now - Good Catch :)Expendable
@yairchu: executing my program at any time other than now is not in the EULA, my legal team will be contacting you last Thursday.Hussein
@ligos, now the memory will never be deleted, plus, the compiler might be too smart for the suggested solution to work correctly anyway (realize the only way out of the loop is through the catch and thus, sees that the original assignment is still never used)Concertino
"And using OutOfMemoryException as control flow just feels wrong". If it was written in VB it would be common practice.Roseleeroselia
GuidCollisionDetector. The name has potentialNeuro
Thanks for this code. I have put it in my GUID generator code for making cross database foreign keys and it's been a great help in preventing all the collisions we've been experiencing. However, now my application freezes up and I can't see why?! Must be a .Net framework bug or something.Hexagon
7 hours on clock and no collision. I give up!Onceover
i'm suing for my money back... I ran this program for 3,654,5334 years before realizing that many of the guid's were not in the original hashset! in fact, every time there was no collision, a future collision was lost! I even bought a machine with 64 exabytes of ram only to realize that .net 4 can only utilize about 9 exabytes at most! this program is a sham! All those lost guids wandering aimlessly in cyberspace..i want my $100 trillion back!Philologian
I hate to point out the obvious, but it's perfectly practical to iterate over every 64-bit integer, given a few years and a few powerful machines. We certainly don't require until the end of the universe.Knitting
@Nick - You realise there's two loops right? It should iterate (2^63)^63 times. Of course, because I'm running things in parallel, it will be much faster ;-)Hussein
@Philologian - I'll just get the quantum version out of that box I put my cat in the other day...Hussein
@Hussein I missed that. Three, actually, since the parallel for is another one.Knitting
Coming late to the party, but OutOfMemoryException cannot be caught with try/catch. It is an asynchronous exception.Libby
The code won't work! The HashSet will reach its limits which are AFAIK 2^32 elements. Even when writting your own HashSet the code is crashing when 2^64 elements are reached, which is the actual limit of C# Arrays.Floor
M
226

This will run for a lot more than hours. Assuming it loops at 1 GHz (which it won't - it will be a lot slower than that), it will run for 10790283070806014188970 years. Which is about 83 billion times longer than the age of the universe.

Assuming Moores law holds, it would be a lot quicker to not run this program, wait several hundred years and run it on a computer that is billions of times faster. In fact, any program that takes longer to run than it takes CPU speeds to double (about 18 months) will complete sooner if you wait until the CPU speeds have increased and buy a new CPU before running it (unless you write it so that it can be suspended and resumed on new hardware).

Matador answered 10/11, 2009 at 0:55 Comment(17)
damn - so maybe serveral threads generating guids is a better idea?Copolymerize
4 threads on a quad core processor would make it run in 20 billion times the age of the universe - so yeah, that would help a lot.Matador
I am suspicious that this is a troll, but on the off chance that it isn't: threads are not magical. If you can do a billion operations per second on one thread, then going to ten threads means that each one runs 1/10th as often. Each thread does 100 M operations per second; the total number of operations per second is not increased. The way to increase number of operations per second is to buy more computers. Suppose you bought a billion more computers. That would reduce the problem to only taking 10790283070806 years, which is still more than four hours.Is
I think rjmunro is assuming each thread would run on a separate core; 83 billion universes / 4 cores indeed approximately equals 20 billion universes. Time to buy Intel stock!Abloom
Would change my downvote to an upvote but it says the vote is too old unless the answer is edited... :(Disintegration
aren't you all going off on one? GUIDs are inheriently non random. They are based on the MAC address of the TCP/IP adaptor in yr machine. The Time they were generated and a random number. So given this weighting they are a lot easier to predict if you have the original machine and no when it was generated. See below.Kirsti
Why don't we just use 83 billion processors? =)Plainlaid
@Tony Lambert - That was one of the GUID generation algorithms, but isn't widely used nowadays because of privacy concerns.Teakettle
@Erik 83 billion processors means that you will be able to do it in about the amount of time the universe has existed so far. So even that's not nearly enough.Matador
@Eric: Your answer would only be correct on a single-core machine, right?Peri
@Steven. Yes. I suppose I should have said "buy more cores".Is
@Eric: Yes, roughly 83 billion more...Peri
If I remember correctly generating GUIDs on this platform is serialized, so don't waste your time refactoring your code for multiple cores.Stave
Moores law does not in fact talk about CPU speeds. It talks about the number of transistors per CPU. Which actually means that threads will become very, very relevant to the argument of waiting for faster hardware.Zora
At your rate of 1 billion GUIDs per second it would only take 634 years to have a 50% chance of a collision.Sixpenny
Hmm. I would recommend putting some HTC to work with some smart parallelisation (alternatively GPU). @Michael Borgwardt: Wasn't it about of number of transistors per inch squared (alternativly cm^2?Elburt
@Maciej: actually, no. Moore's original statement was about "number of components per integrated circuit" and "I believe that such a large circuit can be built on a single wafer."Zora
F
170

A GUID is theoretically non-unique. Here's your proof:

  • GUID is a 128 bit number
  • You cannot generate 2^128 + 1 or more GUIDs without re-using old GUIDs

However, if the entire power output of the sun was directed at performing this task, it would go cold long before it finished.

GUIDs can be generated using a number of different tactics, some of which take special measures to guarantee that a given machine will not generate the same GUID twice. Finding collisions in a particular algorithm would show that your particular method for generating GUIDs is bad, but would not prove anything about GUIDs in general.

Fissi answered 10/11, 2009 at 0:55 Comment(8)
Pigeonhole Principle to the rescue!Static
+1 for the sun going cold comment. There was an interesting comment somewhere about the pointlessness of encryption keys > 256 bits. To iterate over all possible key values would take more energy than the entire universe holds. Toggling a bit in the CPU requires a small amount energy (it's what generates the heat) which, when multiplied up 2^256 times is a really massive number exceeding the energy stored in the universe, using E=mc2 the universe would need mass of 2^227kg, our sun is 2^101kg so thats 2^126 suns!Megaphone
@Skizz: This is true for brute force attacks only. When an encryption scheme is "broken", it means that it can be solved in less time than brute force, but the solve time remains proportional to the key size.Peri
@StevenSudit: proportional to exponent of the key size (unless P==NP)Psalterium
@Orlangur Proportional to the key size measured in bits.Peri
@StevenSudit No. If you double number of bits it will not double the solve time. 2048 bit keys require orders of magnitude more time to break than 1024 bit ones.Psalterium
@Orlangur Each bit doubles the search space.Peri
Holy Mother of Jebus. Quit arguing. xkcd.com/386Fissi
M
137

Of course GUIDs can collide. Since GUIDs are 128-bits, just generate 2^128 + 1 of them and by the pigeonhole principle there must be a collision.

But when we say that a GUID is a unique, what we really mean is that the key space is so large that it is practically impossible to accidentally generate the same GUID twice (assuming that we are generating GUIDs randomly).

If you generate a sequence of n GUIDs randomly, then the probability of at least one collision is approximately p(n) = 1 - exp(-n^2 / 2 * 2^128) (this is the birthday problem with the number of possible birthdays being 2^128).

   n     p(n)
2^30 1.69e-21
2^40 1.77e-15
2^50 1.86e-10
2^60 1.95e-03

To make these numbers concrete, 2^60 = 1.15e+18. So, if you generate one billion GUIDs per second, it will take you 36 years to generate 2^60 random GUIDs and even then the probability that you have a collision is still 1.95e-03. You're more likely to be murdered at some point in your life (4.76e-03) than you are to find a collision over the next 36 years. Good luck.

Mareah answered 10/11, 2009 at 0:55 Comment(7)
If you're murdered at some point in your life, odds are it'll be at the end.Lisabethlisan
@mmyers: Excellent point. That means that my chances of being murdered right now are absurdly low, since this isn't the end of my life. Oh, wait...Peri
Also, if two GUIDs happen to be created within a short period, the chances they are used within the same system is slight. Therefore, this increases uniqueness.Bohs
These numbers and reference to the birthday problem are meaningless. GUID generation algorithms don't generate values over the entire range with equal probability. In fact IIRC the original algorithm used the MAC address of the generating PC + the current time as part of the result - which reduces the risk of collision with Guids generated on other PCs, but of course reduces the key space.Congruence
@Joe: Who says you have to use an off-the-shelf generation algorithm?Intercellular
You're assuming that the probability of being murdered is a constant for all human beings. But clearly people who write snide remarks in forum posts are the kind of people who are more likely to be murdered than the average person.Parsons
@Joe: The standard algorithm used nowadays is a random number, with a few bits set to indicate the generation algorithm. So there are slightly less than 128 random bits. You're right that the original algorithm used a timestamp and MAC code, but there was a big fuss in the press about "Microsoft inserting the computer's identity [MAC address] in a hidden place [in a GUID] in MS office documents," so they switched to the random scheme.Baerl
M
61

If you're worried about uniqueness you can always purchase new GUIDs so you can throw away your old ones. I'll put some up on eBay if you'd like.

Manic answered 10/11, 2009 at 0:55 Comment(4)
Cool - how much for the complete set, from 0 to (2^128)-1 ?Hornstone
On sale, $0.01 per 1k GUIDs. I'll throw in some bamboo wind chimes if you order in the next 60 minutes.Manic
My set is more exclusive and of higher quality. They are double checked and verified which makes them worth the $1 per GUID. You can even buy them in batches if you don't want to make the full investment in one go. I will have to charge an extra $10 per batch though.Hyperion
I'll set you up on a monthly plan and give you unlimited guids for the right price. ^ Those guys are trying to scam you and sell you overpriced guids. I'll sell you quality guids made in China!Nowadays
B
47

Personally, I think the "Big Bang" was caused when two GUIDs collided.

Bohs answered 10/11, 2009 at 0:55 Comment(4)
Just remember It takes a "special" kind of programmer to do that...Kirsti
I'd like to hear your reasoning to your theory. I think we could start a new religion based on this and recruit T.Cruise!Nowadays
@ErocM; See "Brane cosmology" (en.wikipedia.org/wiki/Brane_cosmology) and "Membrane (M-Theory)" (en.wikipedia.org/wiki/Membrane_(M-Theory)). The idea is if two branes touch a new universe is created. Therefore, you can infer that if two GUIDs touch, a new universe is thus created.Bohs
If Timecop taught us anything is that the same matter cannot occupy the same space at any given time. So if two GUIDs where to collide, they would consume each other and the resulting implosion would generate a black hole, gobbling up the entire universe. So in reality, it wouldn't create a Universe, it'll destroy it.Susansusana
G
42

You can show that in O(1) time with a variant of the quantum bogosort algorithm.

Guid g1 = Guid.NewGuid();
Guid g2 = Guid.NewGuid();
if(g1 != g2) Universe.Current.Destroy();
Gamekeeper answered 10/11, 2009 at 0:55 Comment(10)
I'm getting an exception when calling Destroy(). Based on the text, I think my computer lacks the necessary hardware to destroy the current universe. Do you know where I might be able to obtain it?Peri
@Steven: Nah, some management guys got too worried about how bad that API would look to the public, and dictated it to always fail for "security reasons". If you look at the method's source there's just that one line: throw new MundaneHardwareException();. Anyway, I heard the guys at CERN have some kind of Big Hadron Thingy that might do the trick...Gamekeeper
@Martinho: Ah, ok. I'll look into replacing Universe.Current.Destroy() with Cern.Lhc.DestroyThisUniverse().Peri
I knew there was a reason I programmed in Haskell. These side effects are getting scary.Ushijima
"There is a theory which states that if ever anyone ever discovers exactly what the Universe is for and why it is here, it will instantly disappear and be replaced by something even more bizarrely inexplicable. There is another theory that states that this has already happened." -- Douglas Adams, The Hitchhiker's Guide to the GalaxyArrogance
I am confused. Shouldn't it be if(g1 == g2)?Bohs
@AMissico: I believe the point is to destroy all possible universes in which you don't find a match so that in the surviving universe(s) you do have a match. Of course, I wonder if the assignments would need to be something like Guid g1 = Universe.Quantum.Branch(Guid.NewGuid()); Otherwise, the deterministic nature of pseudorandom generation of Guid.NewGuid() would tend to produce the same result in all universes branching from the same initial execution.Microcyte
@Rob: surely if we can destroy universes, we can have truly random (i.e., quantum-based) generators :PGamekeeper
@Martinho: Sure, but that wouldn't be in the built-in implementation of Guid.NewGuid() would it? But my code example above is probably incorrect; you would need to use your truly-random source or a more exact Universe.Quantum.Branch.ChooseBytes(16) (returns a byte array of specified length with a 1-for-1 distribution among the necessary branches generated) to then feed to the Guid creation (or set the value into a blank Guid instance). Or you could chain 128 calls to Universe.Quantum.Branch.Fork() to build up the bits (wich is what ChooseBytes(int) does under the covers).Microcyte
@MikePirnat: My theory is that, as soon as this mechanism was discovered, it was replaced by some much mire bizarrely inexplicable mechanism. (And, no, a theory that this had already happened would not make sense.)Initiatory
B
28

Any two GUIDs are very likely unique (not equal).

See this SO entry, and from Wikipedia

While each generated GUID is not guaranteed to be unique, the total number of unique keys (2^128 or 3.4×10^38) is so large that the probability of the same number being generated twice is very small. For example, consider the observable universe, which contains about 5×10^22 stars; every star could then have 6.8×10^15 universally unique GUIDs.

So probably you have to wait for many more billion of years, and hope that you hit one before the universe as we know it comes to an end.

Banas answered 10/11, 2009 at 0:55 Comment(5)
so is 2^128 not the correct number of possible guids?Copolymerize
It is. Why do you think 2^128 is a small number?Darceydarci
Yes, 2^128 is the correct number of possible guids.Banas
That's a hell of a number. $ irb >> 2**128 => 340282366920938463463374607431768211456Tommie
@Infinity - Even to you?Bibbs
I
27

[Update:] As the comments below point out, newer MS GUIDs are V4 and do not use the MAC address as part of the GUID generation (I haven't seen any indication of a V5 implementation from MS though, so if anyone has a link confirming that let me know). WIth V4 though, time is still a factor though, and the odds against duplication of GUIDs remains so small as to be irrelevant for any practical usage. You certainly would not be likely to ever generate a duplicate GUID from just a single system test such as the OP was trying to do.

Most of these answers are missing one vital point about Microsoft's GUID implementation. The first part of the GUID is based on a timestamp and another part is based on the MAC address of the network card (or a random number if no NIC is installed).

If I understand this correctly, it means that the only reliable way to duplicate a GUID would be to run simultainous GUID generations on multiple machines where the MAC addresses were the same AND where the clocks on both systems were at the same exact time when the generation occured (the timestamp is based on milliseconds if I understand it correctly).... even then there are a lot of other bits in the number that are random, so the odds are still vanishingly small.

For all practical purposes the GUIDs are universally unique.

There is a pretty good description of the MS GUID over at "The Old New Thing" blog

Inelegant answered 10/11, 2009 at 0:55 Comment(5)
Thats actually doable when using virtualization. You can and you do get duplicate guids.Meyeroff
Raymond is outdated on the MAC Address part though, Microsoft doesn't use these anymore. See en.wikipedia.org/wiki/GUID#Algorithm for the difference between V1 and V4 Guids.Contextual
This is no longer the case. The current V5 scheme is just 128 bits of pure pseudorandom goodness.Ushijima
funny how you state everything I did a month later than me and you get 16 points and I still have 0?Kirsti
Ya Tony, there is something odd with that. Back when I answered the post, there were only 3 or 4 answers, and I didn't remember seeing yours... if I had, I'd just have upvoted it. I typically don't answer questions when there are other answers already that cover it well enough (which is why I have a rather low overall rep probably).Inelegant
E
23

Here's a nifty little extension method that you can use if you want to check guid uniqueness in many places in your code.

internal static class GuidExt
{
    public static bool IsUnique(this Guid guid)
    {
        while (guid != Guid.NewGuid())
        { }
        return false;
    }
}

To call it, simply call Guid.IsUnique whenever you generate a new guid...

Guid g = Guid.NewGuid();
if (!g.IsUnique())
{
    throw new GuidIsNotUniqueException();
}

...heck, I'd even recommend calling it twice to make sure it got it right in the first round.

Eliseoelish answered 10/11, 2009 at 0:55 Comment(1)
How does this ensure that this guid has never been generated anywhere else in this world? :p Heck we need a world guid pool. :)Laundryman
H
19

Counting to 2^128 - ambitious.

Lets imagine that we can count 2^32 IDs per second per machine - not that ambitious, since it's not even 4.3 billion per second. Lets dedicate 2^32 machines to that task. Furthermore, lets get 2^32 civilisations to each dedicate the same resources to the task.

So far, we can count 2^96 IDs per second, meaning we will be counting for 2^32 seconds (a little over 136 years).

Now, all we need is to get 4,294,967,296 civilisations to each dedicate 4,294,967,296 machines, each machine capable of counting 4,294,967,296 IDs per second, purely to this task for the next 136 years or so - I suggest we get started on this essential task right now ;-)

Hornstone answered 10/11, 2009 at 0:55 Comment(0)
S
17

Well if the running time of 83 billion years does not scare you, think that you will also need to store the generated GUIDs somewhere to check if you have a duplicate; storing 2^128 16-byte numbers would only require you to allocate 4951760157141521099596496896 terabytes of RAM upfront, so imagining you have a computer which could fit all that and that you somehow find a place to buy terabyte DIMMs at 10 grams each, combined they will weigh more than 8 Earth masses, so you can seriously shift it off the current orbit, before you even press "Run". Think twice!

Spherulite answered 10/11, 2009 at 0:55 Comment(0)
T
12
for(begin; begin<end; begin)
    Console.WriteLine(System.Guid.NewGuid().ToString());

You aren't incrementing begin so the condition begin < end is always true.

Traceable answered 10/11, 2009 at 0:55 Comment(4)
no - cause i can't iterate bigintCopolymerize
Does it really matter if he loops forever versus looping 340282366920938463463374607431768211456 times?Parsons
so... would you rather be punched 340282366920938463463374607431768211456 times or forever!?!?!?Nowadays
actually this is what really answers to the question! and no votes at all :pLaundryman
T
11

If GUID collisions are a concern, I would recommend using the ScottGuID instead.

Tybie answered 10/11, 2009 at 0:55 Comment(0)
N
9

Presumably you have reason to be believe that the algorithm for producing Guids is not producing truly random numbers, but is in fact cycling with a period << 2^128.

e.g. RFC4122 method used to derive GUIDs which fixes the values of some bits.

Proof of cycling is going to depend upon the possible size of the period.

For small periods, hash table of hash(GUID) -> GUID with replacement on collision if GUIDs do not match (terminate if they do) might be an approach. Consider also only doing the replacement a random fraction of the time.

Ultimately if the maximum period between collisions is large enough (and isn't known in advance) any method is only going to yield a probability that the collision would be found if it existed.

Note that if the method of generating Guids is clock based (see the RFC), then it may not be possible to determine if collisions exist because either (a) you won't be able to wait long enough for the clock to wrap round, or (b) you can't request enough Guids within a clock tick to force a collision.

Alternatively you might be able to show a statistical relationship between the bits in the Guid, or a correlation of bits between Guids. Such a relationship might make it highly probable that the algorithm is flawed without necessarily being able to find an actual collision.

Of course, if you just want to prove that Guids can collide, then a mathematical proof, not a program, is the answer.

Nonsmoker answered 10/11, 2009 at 0:55 Comment(0)
S
8

But do you have to be sure you have a duplicate, or do you only care if there can be a duplicate. To be sure that you have two people with the same birthday, you need 366 people (not counting leap year). For there to be a greater than 50% chance of having two people with the same birthday you only need 23 people. That's the birthday problem.

If you have 32 bits, you only need 77,163 values to have a greater than 50% chance of a duplicate. Try it out:

Random baseRandom = new Random(0);

int DuplicateIntegerTest(int interations)
{
    Random r = new Random(baseRandom.Next());
    int[] ints = new int[interations];
    for (int i = 0; i < ints.Length; i++)
    {
        ints[i] = r.Next();
    }
    Array.Sort(ints);
    for (int i = 1; i < ints.Length; i++)
    {
        if (ints[i] == ints[i - 1])
            return 1;
    }
    return 0;
}

void DoTest()
{
    baseRandom = new Random(0);
    int count = 0;
    int duplicates = 0;
    for (int i = 0; i < 1000; i++)
    {
        count++;
        duplicates += DuplicateIntegerTest(77163);
    }
    Console.WriteLine("{0} iterations had {1} with duplicates", count, duplicates);
}

1000 iterations had 737 with duplicates

Now 128 bits is a lot, so you are still talking a large number of items still giving you a low chance of collision. You would need the following number of records for the given odds using an approximation:

  • 0.8 billion billion for a 1/1000 chance of a collision occurring
  • 21.7 billion billion for 50% chance of a collision occurring
  • 39.6 billion billion for 90% chance of a collision occurring

There are about 1E14 emails sent per year so it would be about 400,000 years at this level before you would have a 90% chance of having two with the same GUID, but that is a lot different than saying you need to run a computer 83 billion times the age of the universe or that the sun would go cold before finding a duplicate.

Sixpenny answered 10/11, 2009 at 0:55 Comment(0)
J
8

I don't understand why no one has mentioned upgrading your graphics card... Surely if you got a high-end NVIDIA Quadro FX 4800 or something (192 CUDA cores) this would go faster...

Of course if you could afford a few NVIDIA Qadro Plex 2200 S4s (at 960 CUDA cores each), this calculation would really scream. Perhaps NVIDIA would be willing to loan you a few for a "Technology Demonstration" as a PR stunt?

Surely they'd want to be part of this historic calculation...

Jdavie answered 10/11, 2009 at 0:55 Comment(1)
hmmmm..... I could run it on our 10,000 node grid at work.Kirsti
K
7

Aren't you all missing a major point?

I thought GUIDs were generated using two things which make the chances of them being Globally unique quite high. One is they are seeded with the MAC address of the machine that you are on and two they use the time that they were generated plus a random number.

So unless you run it on the actual machine and run all you guesses within the smallest amount of time that the machine uses to represent a time in the GUID you will never generate the same number no matter how many guesses you take using the system call.

I guess if you know the actual way a GUID is made would actually shorten the time to guess quite substantially.

Tony

Kirsti answered 10/11, 2009 at 0:55 Comment(7)
Not all GUIDs are created this way. Even if they were, Kai need only wait until the timestamp used to create the GUID wraps around enough times for one he used to create a GUID is used again.Abloom
Guids have not been based on the mac address since 2000 or 2001. As of one of the service packs for NT4 and/or Win2k they changed the algorithm altogether. They are now generated by a random number generator, minus a few bits that identify what kind of guid it is.Eliseoelish
not all GUIDs come from windows platforms...Kirsti
OP mentions C#, so it's Windows. Besides, are V4 GUID's a Windows-only thing?Peri
@Steven: OP mentions C#, so it's... one of several platforms that you can currently compile C# for (.NET, Mono, dotGNU, MonoTouch, ...)Gamekeeper
@Martinho: Ah, but Mono's unit test for Guid, in GuidTest.cs, contains a method which creates two new GUID's and checks them for equality, failing if they are equal. As Mono builds successfully, we can be absolutely certain that its GUIDs are unique! :-)Peri
@Martinho: Humor aside, Mono uses a PRNG. search.koders.com/csharp/…Peri
F
6
  1. Go to the cryogenics lab in the New York City.
  2. Freeze yourself for (roughly) 1990 years.
  3. Get a job at Planet Express.
  4. Buy a brand-new CPU. Build a computer, run the program, and place it in the safe place with an pseudo-perpetual motion machine like the doomsday machine.
  5. Wait until the time machine is invented.
  6. Jump to the future using the time machine. If you bought 1YHz 128bit CPU, go to 3,938,453,320 days 20 hours 15 minutes 38 seconds 463 ms 463 μs 374 ns 607 ps after when you started to run the program.
  7. ...?
  8. PROFIT!!!

... It takes at least 10,783,127 years even if you had 1YHz CPU which is 1,000,000,000,000,000 (or 1,125,899,906,842,624 if you prefer to use binary prefix) times faster than 1GHz CPU.

So rather than waiting for the compute finished, it would be better to feed pigeons which lost their home because other n pigeons took their home. :(

Or, you can wait until 128-bit quantum computer is invented. Then you may prove that GUID is not unique, by using your program in reasonable time(maybe).

Fellowship answered 10/11, 2009 at 0:55 Comment(1)
I was waiting for a super hero reference in this answer - fail by poster : p - awesome none the less.Quirita
Q
6

GUIDs are 124 bits because 4 bits hold the version number.

Quartus answered 10/11, 2009 at 0:55 Comment(2)
the reason for not adding this as a comment:no one mentioned it, and i don't know who i should tell this to.:)Quartus
Hooooraaaay i did it.In some "real" app i wrote, I got a Guid collision in a table with ~260k Rows. (MSSQL 2008 R2 Express).Quartus
C
6

You could hash the GUIDs. That way, you should get a result much faster.

Oh, of course, running multiple threads at the same time is also a good idea, that way you'll increase the chance of a race condition generating the same GUID twice on different threads.

Contextual answered 10/11, 2009 at 0:55 Comment(0)
H
4

If the number of UUID being generated follows Moore's law, the impression of never running out of GUID in the foreseeable future is false.

With 2 ^ 128 UUIDs, it will only take 18 months * Log2(2^128) ~= 192 years, before we run out of all UUIDs.

And I believe (with no statistical proof what-so-ever) in the past few years since mass adoption of UUID, the speed we are generating UUID is increasing way faster than Moore's law dictates. In other words, we probably have less than 192 years until we have to deal with UUID crisis, that's a lot sooner than end of universe.

But since we definitely won't be running them out by the end of 2012, we'll leave it to other species to worry about the problem.

Hexagonal answered 10/11, 2009 at 0:55 Comment(0)
D
4

Have you tried begin = begin + new BigInteger((long)1) in place of begin++?

Disintegration answered 10/11, 2009 at 0:55 Comment(1)
nobody has voted for the answer that really answers the question :PLaundryman
C
3

The odds of a bug in the GUID generating code are much higher than the odds of the algorithm generating a collision. The odds of a bug in your code to test the GUIDs is even greater. Give up.

Cultured answered 10/11, 2009 at 0:55 Comment(0)
W
2

Not to p**s on the bonfire here, but it does actually happen, and yes, I understand the joking you have been giving this guy, but the GUID is unique only in principle, I bumped into this thread because there is a bug in the WP7 emulator which means every time it boots it gives out the SAME GUID the first time it is called! So, where in theory you cannot have a conflict, if there is a problem generating said GUI, then you can get duplicates

http://forums.create.msdn.com/forums/p/92086/597310.aspx#597310

Winded answered 10/11, 2009 at 0:55 Comment(0)
O
2

The program, albeit its errors, shows proof that a GUID is not unique. Those that try to prove the contrary are missing the point. This statement just proves the weak implementation of some of the GUID variations.

A GUID is not necessary unique by definition, it is highly unique by definition. You just refined the meaning of highly. Depending on the version, the implementator (MS or others), use of VM's, etc your definition of highly changes. (see link in earlier post)

You can shorten your 128 bit table to prove your point. The best solution is to use a hash formula to shorten your table with duplicates, and then use the full value once the hash collides and based on that re-generate a GUID. If running from different locations, you would be storing your hash/full key pairs in a central location.

Ps: If the goal is just to generate x number of different values, create a hash table of this width and just check on the hash value.

Outdistance answered 10/11, 2009 at 0:55 Comment(0)
S
1

Since part of Guid generation is based on the current machine's time, my theory to get a duplicate Guid is:

  1. Perform a clean installation of Windows
  2. Create a startup script that resets the time to 2010-01-01 12:00:00 just as Windows boots up.
  3. Just after the startup script, it triggers your application to generate a Guid.
  4. Clone this Windows installation, so that you rule out any subtle differences that may occur in subsequent boot-ups.
  5. Re-image the hard drive with this image and boot-up the machine a few times.
Sloe answered 10/11, 2009 at 0:55 Comment(0)
L
0

The only solution to prove GUIDs are not unique would be by having a World GUID Pool. Each time a GUID is generated somewhere, it should be registered to the organization. Or heck, we might include a standardization that all GUID generators needs to register it automatically and for that it needs an active internet connection!

Laundryman answered 10/11, 2009 at 0:55 Comment(0)
P
0

Here's a solution, too:

int main()
{
  QUuid uuid;
  while ( (uuid = QUuid::createUuid()) != QUuid::createUuid() ) { }
  std::cout << "Aha! I've found one! " << qPrintable( uuid.toString() ) << std::endl;
}

Note: requires Qt, but I guarantee that if you let it run long enough, it might find one.

(Note note: actually, now that I'm looking at it, there may be something about the generation algorithm that prevents two subsequently generated uuids that collide--but I kinda doubt it).

Peirce answered 10/11, 2009 at 0:55 Comment(0)
G
0

For me.. the time it takes for a single core to generate a UUIDv1 guarantees it will be unique. Even in a multi core situation if the UUID generator only allows one UUID to be generated at a time for your specific resource (keep in mind that multiple resources can totally utilize the same UUIDs however unlikely since the resource inherently part of the address) then you will have more than enough UUIDs to last you until the timestamp burns out. At which point I really doubt you would care.

Graaf answered 10/11, 2009 at 0:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.