.NET unique object identifier
Asked Answered
M

11

141

Is there a way of getting a unique identifier of an instance?

GetHashCode() is the same for the two references pointing to the same instance. However, two different instances can (quite easily) get the same hash code:

Hashtable hashCodesSeen = new Hashtable();
LinkedList<object> l = new LinkedList<object>();
int n = 0;
while (true)
{
    object o = new object();
    // Remember objects so that they don't get collected.
    // This does not make any difference though :(
    l.AddFirst(o);
    int hashCode = o.GetHashCode();
    n++;
    if (hashCodesSeen.ContainsKey(hashCode))
    {
        // Same hashCode seen twice for DIFFERENT objects (n is as low as 5322).
        Console.WriteLine("Hashcode seen twice: " + n + " (" + hashCode + ")");
        break;
    }
    hashCodesSeen.Add(hashCode, null);
}

I'm writing a debugging addin, and I need to get some kind of ID for a reference which is unique during the run of the program.

I already managed to get internal ADDRESS of the instance, which is unique until the garbage collector (GC) compacts the heap (= moves the objects = changes the addresses).

Stack Overflow question Default implementation for Object.GetHashCode() might be related.

The objects are not under my control as I am accessing objects in a program being debugged using the debugger API. If I was in control of the objects, adding my own unique identifiers would be trivial.

I wanted the unique ID for building a hashtable ID -> object, to be able to lookup already seen objects. For now I solved it like this:

Build a hashtable: 'hashCode' -> (list of objects with hash code == 'hashCode')
Find if object seen(o) {
    candidates = hashtable[o.GetHashCode()] // Objects with the same hashCode.
    If no candidates, the object is new
    If some candidates, compare their addresses to o.Address
        If no address is equal (the hash code was just a coincidence) -> o is new
        If some address equal, o already seen
}
Maidie answered 15/4, 2009 at 9:39 Comment(0)
F
48

The reference is the unique identifier for the object. I don't know of any way of converting this into anything like a string etc. The value of the reference will change during compaction (as you've seen), but every previous value A will be changed to value B, so as far as safe code is concerned it's still a unique ID.

If the objects involved are under your control, you could create a mapping using weak references (to avoid preventing garbage collection) from a reference to an ID of your choosing (GUID, integer, whatever). That would add a certain amount of overhead and complexity, however.

Fisher answered 15/4, 2009 at 9:44 Comment(20)
I guess for lookups you'd have to iterate over all the references you track: WeakReference to the same object are not equal to each other, so you can't really do much else.Endaendall
There could be some usefulness to having each object assigned a unique 64-bit ID, especially if such IDs were issued sequentially. I'm not sure the usefulness would justify the cost, but such a thing could be helpful if one compares two distinct immutable objects and finds them equal; if one when possible overwrites the reference to the newer one with a reference to the older one, one can avoid having many redundant references to identical but distinct objects.Triste
“Identifier.” I do not think that word means what you think it means.Marimaria
@Slipp: who was that addressed to? Please give more details about what you mean.Fisher
@JonSkeet You. Look up the word “identifier” in a good English-language dictionary.Marimaria
@Slipp: If you dislike my answer, I suggest you add your own better one. It's still not really clear to me what you're objecting to though... The reference identifies the instance in my view. Why would there have to be a string representation?Fisher
@JonSkeet: Outside of the scope of programming, an “identifier” is a thing that provides a label to distinguish a unique object or class of objects— a 1-to-1 relation. In programming, an “object” is specific chunk of memory holding the state of an object of a correlated type, and a “reference” is a means by which to refer to or link to a given object— a many-to-one relation. So following the word semantics and logic deduction, a “programming reference” cannot be an “identifier”, much less an identifier explicitly reinforced to be unique. Your opening statement is false.Marimaria
@SlippD.Thompson: No, it's still a 1-to-1 relation. There's only a single reference value which refers to any given object. That value may appear many times in memory (e.g. as the value of multiple variables), but it's still a single value. It's like a house address: I can write down my home address on multiple on many pieces of paper, but that's still the identifier for my house. Any two non-identical reference values must refer to different objects - at least in C#.Fisher
@SlippD.Thompson: An .NET object's identity isn't encapsulated in a reference; an object's identity is encapsulated by the whereabouts of all references which exist to that same object throughout the .NET universe in which it resides. If only one reference exists to an object, that reference will not encapsulate any meaningful identity. Because .NET doesn't even try to track down all the references that may exist to an object (once it's identified one rooted reference, that's good enough), there's no way to convert an object's identity into any sort of concise format.Triste
@supercat: I think that depends on what you mean by "a reference" here - if two variables both have values which refer to the same object, I'd call those the same references (the values will have the same bit pattern). In that sense, the identity is encapsulated in the reference - if you compare two references bitwise, that will tell you whether or not they refer to the same object.Fisher
@JonSkeet: I don't think there's any requirement that all references to a particular object have the same bit pattern. In present implementations they happen to do so, but it would be conceivable that in e.g. some future concurrent GC they might not. If a future processor included a "load object address" instruction and had registers which could a trap if certain values were loaded thereby, a concurrent GC could relocate objects while other threads were running, provided that it set traps for the old and new addresses. Code which used "load object address" to fetch the references...Triste
...would see them as the same [since the trap code could examine the references and update the old one to match the new one] but code which examined the memory containing reference-type fields might see the values as different. My point was that given two snapshots of the system state, it will not not in general possible to determine with certainty that an object in one snapshot is the same as an object in the other, unless in the second snapshot there exists a reference which one knows has pointed to that object at all times since the first was taken.Triste
@supercat: I definitely take your point around compaction. However, the references would at least need to still compare equal under the ceq IL. I suspect this sort of subtle issue isn't what Slipp was talking about though. I personally like to keep at least the simpler conceptual model, even if clever stuff goes on behind the scenes :)Fisher
@JonSkeet: Certainly they'd have to compare as equal under "ceq"; my point was that if there are two objects of the same class which have identical field contents, given the same identity-hash value, and sit in the same GC generation, and if references "a" and "b" exist to one, and references "c" and "d" exist to the other, the only difference between the objects would be that one of them would be referred to by "a" and "b", and the other by "c" and "d". If one were to simultaneously store a reference to the first object into "c" and "d", and one to the second into "a" and "b"...Triste
@JonSkeet: ...such action would have no observable effect on the program's execution. The variables would still appear to identify the same objects as they did before the swap.Triste
@supercat: Right. So a and b are equal, and c and d are equal. Therefore the references act as identity in that if two references are equal, they refer to the same object and if they're not, they refer to different objects.Fisher
@JonSkeet: Right. My point is that the only information which is encapsulated by "a" that isn't encapsulated by "c", is the fact that "b" references the same object; to me, that implies that the "identities" encapsulated by references "a" and "c" are not stored in those variables, nor in the objects themselves, but are also stored in part in references "b" and "d".Triste
@supercat: I think we may differ in our understanding of "identities being encapsulated" - but I think we're also probably not helping anyone to go any further than we already have :) Just one of the topics we should discuss at length if we ever meet in person...Fisher
when you say "reference" you are talking about GetHashCode()?Cytochrome
@Gerry: No, I mean the reference. The hash code is entirely different.Fisher
C
78

.NET 4 and later only

Good news, everyone!

The perfect tool for this job is built in .NET 4 and it's called ConditionalWeakTable<TKey, TValue>. This class:

  • can be used to associate arbitrary data with managed object instances much like a dictionary (although it is not a dictionary)
  • does not depend on memory addresses, so is immune to the GC compacting the heap
  • does not keep objects alive just because they have been entered as keys into the table, so it can be used without making every object in your process live forever
  • uses reference equality to determine object identity; moveover, class authors cannot modify this behavior so it can be used consistently on objects of any type
  • can be populated on the fly, so does not require that you inject code inside object constructors
Caprice answered 20/3, 2012 at 15:6 Comment(2)
Just for completeness: ConditionalWeakTable relies on RuntimeHelpers.GetHashCode and object.ReferenceEquals to do its inner workings. The behavior is the same as building an IEqualityComparer<T> that uses these two methods. If you need performance, I actually suggest to do this, since ConditionalWeakTable has a lock around all its operations to make it thread safe.Ipsus
@StefandeBruijn: A ConditionalWeakTable holds a reference to each Value which is only as strong as the reference held elsewhere to the corresponding Key. An object to which a ConditionalWeakTable holds the only extant reference anywhere in the universe will automatically cease to exist when the key does.Triste
H
51

Checked out the ObjectIDGenerator class? This does what you're attempting to do, and what Marc Gravell describes.

The ObjectIDGenerator keeps track of previously identified objects. When you ask for the ID of an object, the ObjectIDGenerator knows whether to return the existing ID, or generate and remember a new ID.

The IDs are unique for the life of the ObjectIDGenerator instance. Generally, a ObjectIDGenerator life lasts as long as the Formatter that created it. Object IDs have meaning only within a given serialized stream, and are used for tracking which objects have references to others within the serialized object graph.

Using a hash table, the ObjectIDGenerator retains which ID is assigned to which object. The object references, which uniquely identify each object, are addresses in the runtime garbage-collected heap. Object reference values can change during serialization, but the table is updated automatically so the information is correct.

Object IDs are 64-bit numbers. Allocation starts from one, so zero is never a valid object ID. A formatter can choose a zero value to represent an object reference whose value is a null reference (Nothing in Visual Basic).

Hydroquinone answered 15/4, 2009 at 10:58 Comment(6)
Reflector tells me that ObjectIDGenerator is a hashtable relying on the default GetHashCode implementation (i.e. it does not use user overloads).Footpace
Probably the best solution when printable unique IDs are required.Endaendall
ObjectIDGenerator isn't implemented on the phone either.Durian
I don't understand exactly what ObjectIDGenerator is doing but it seems to work, even when it is using RuntimeHelpers.GetHashCode. I tested both and only RuntimeHelpers.GetHashCode fails in my case.Businessman
+1 -- Works pretty slick (on the desktop, at least).Diandiana
Now obsolete in .NET 8. Any idea of a good replacement? Just use RuntimeHelpers.GetHashCode()?Hunk
F
48

The reference is the unique identifier for the object. I don't know of any way of converting this into anything like a string etc. The value of the reference will change during compaction (as you've seen), but every previous value A will be changed to value B, so as far as safe code is concerned it's still a unique ID.

If the objects involved are under your control, you could create a mapping using weak references (to avoid preventing garbage collection) from a reference to an ID of your choosing (GUID, integer, whatever). That would add a certain amount of overhead and complexity, however.

Fisher answered 15/4, 2009 at 9:44 Comment(20)
I guess for lookups you'd have to iterate over all the references you track: WeakReference to the same object are not equal to each other, so you can't really do much else.Endaendall
There could be some usefulness to having each object assigned a unique 64-bit ID, especially if such IDs were issued sequentially. I'm not sure the usefulness would justify the cost, but such a thing could be helpful if one compares two distinct immutable objects and finds them equal; if one when possible overwrites the reference to the newer one with a reference to the older one, one can avoid having many redundant references to identical but distinct objects.Triste
“Identifier.” I do not think that word means what you think it means.Marimaria
@Slipp: who was that addressed to? Please give more details about what you mean.Fisher
@JonSkeet You. Look up the word “identifier” in a good English-language dictionary.Marimaria
@Slipp: If you dislike my answer, I suggest you add your own better one. It's still not really clear to me what you're objecting to though... The reference identifies the instance in my view. Why would there have to be a string representation?Fisher
@JonSkeet: Outside of the scope of programming, an “identifier” is a thing that provides a label to distinguish a unique object or class of objects— a 1-to-1 relation. In programming, an “object” is specific chunk of memory holding the state of an object of a correlated type, and a “reference” is a means by which to refer to or link to a given object— a many-to-one relation. So following the word semantics and logic deduction, a “programming reference” cannot be an “identifier”, much less an identifier explicitly reinforced to be unique. Your opening statement is false.Marimaria
@SlippD.Thompson: No, it's still a 1-to-1 relation. There's only a single reference value which refers to any given object. That value may appear many times in memory (e.g. as the value of multiple variables), but it's still a single value. It's like a house address: I can write down my home address on multiple on many pieces of paper, but that's still the identifier for my house. Any two non-identical reference values must refer to different objects - at least in C#.Fisher
@SlippD.Thompson: An .NET object's identity isn't encapsulated in a reference; an object's identity is encapsulated by the whereabouts of all references which exist to that same object throughout the .NET universe in which it resides. If only one reference exists to an object, that reference will not encapsulate any meaningful identity. Because .NET doesn't even try to track down all the references that may exist to an object (once it's identified one rooted reference, that's good enough), there's no way to convert an object's identity into any sort of concise format.Triste
@supercat: I think that depends on what you mean by "a reference" here - if two variables both have values which refer to the same object, I'd call those the same references (the values will have the same bit pattern). In that sense, the identity is encapsulated in the reference - if you compare two references bitwise, that will tell you whether or not they refer to the same object.Fisher
@JonSkeet: I don't think there's any requirement that all references to a particular object have the same bit pattern. In present implementations they happen to do so, but it would be conceivable that in e.g. some future concurrent GC they might not. If a future processor included a "load object address" instruction and had registers which could a trap if certain values were loaded thereby, a concurrent GC could relocate objects while other threads were running, provided that it set traps for the old and new addresses. Code which used "load object address" to fetch the references...Triste
...would see them as the same [since the trap code could examine the references and update the old one to match the new one] but code which examined the memory containing reference-type fields might see the values as different. My point was that given two snapshots of the system state, it will not not in general possible to determine with certainty that an object in one snapshot is the same as an object in the other, unless in the second snapshot there exists a reference which one knows has pointed to that object at all times since the first was taken.Triste
@supercat: I definitely take your point around compaction. However, the references would at least need to still compare equal under the ceq IL. I suspect this sort of subtle issue isn't what Slipp was talking about though. I personally like to keep at least the simpler conceptual model, even if clever stuff goes on behind the scenes :)Fisher
@JonSkeet: Certainly they'd have to compare as equal under "ceq"; my point was that if there are two objects of the same class which have identical field contents, given the same identity-hash value, and sit in the same GC generation, and if references "a" and "b" exist to one, and references "c" and "d" exist to the other, the only difference between the objects would be that one of them would be referred to by "a" and "b", and the other by "c" and "d". If one were to simultaneously store a reference to the first object into "c" and "d", and one to the second into "a" and "b"...Triste
@JonSkeet: ...such action would have no observable effect on the program's execution. The variables would still appear to identify the same objects as they did before the swap.Triste
@supercat: Right. So a and b are equal, and c and d are equal. Therefore the references act as identity in that if two references are equal, they refer to the same object and if they're not, they refer to different objects.Fisher
@JonSkeet: Right. My point is that the only information which is encapsulated by "a" that isn't encapsulated by "c", is the fact that "b" references the same object; to me, that implies that the "identities" encapsulated by references "a" and "c" are not stored in those variables, nor in the objects themselves, but are also stored in part in references "b" and "d".Triste
@supercat: I think we may differ in our understanding of "identities being encapsulated" - but I think we're also probably not helping anyone to go any further than we already have :) Just one of the topics we should discuss at length if we ever meet in person...Fisher
when you say "reference" you are talking about GetHashCode()?Cytochrome
@Gerry: No, I mean the reference. The hash code is entirely different.Fisher
P
43

RuntimeHelpers.GetHashCode() may help (MSDN).

Plutonium answered 15/4, 2009 at 9:44 Comment(8)
That may well help, but with a cost - IIRC, using the base object.GetHashCode() needs to allocate a sync block, which isn't free. Nice idea though - +1 from me.Fisher
Thanks, I didn't know this method. However, it does not produce unique hash code either (behaves exactly the same as the sample code in the question). Will be useful though if the user overrides hash code, to call the default version.Maidie
You can use GCHandle if you don't need too many of them (see below).Footpace
A book on .NET by a highly respected author states that RuntimeHelpers.GetHashCode() will produce a code that is unique within an AppDomain, and that Microsoft could have named the method GetUniqueObjectID. This is simply wrong. In testing, I found that I would usually get a duplicate by the time I had created 10,000 instance of an object (a WinForms TextBox), and could never get past 30,000. Code relying on the supposed uniqueness was causing intermittent crashes in a production system after creating no more than 1/10 that many objects.Whittling
@JonSkeet: FYI, GetHashCode doesn't allocate a sync block. Instead, the sync block field of the object header uses a couple bits to indicate whether it represents an offset into the sync block table, holds a GetHashCode value, or neither.Triste
@supercat: So is the hash code copied into the actual sync block when the sync block is allocated? I could have sworn I'd read some details of a version of .NET which allocated a sync block on the first hash code call. Hmm. Wish I could remember where.Fisher
@JonSkeet: I wouldn't be surprised if .net 1.0 created a new sync block for a hash code, but later versions started storing it in ths syncblock-offset word (if a hash code is created before a sync block is needed for some other reason, the hash code would get copied from the syncblock-offset word to the newly-created sync block). Given that testing whether a word is "negative" is no more expensive than testing whether it's zero, that would be an easy optimization.Triste
@supercat: Aha - have just found some evidence, from 2003, which was from .NET 1.0 and 1.1. Looks like they were planning to change for .NET 2: blogs.msdn.com/b/brada/archive/2003/09/30/50396.aspxFisher
O
7

You can develop your own thing in a second. For instance:

   class Program
    {
        static void Main(string[] args)
        {
            var a = new object();
            var b = new object();
            Console.WriteLine("", a.GetId(), b.GetId());
        }
    }

    public static class MyExtensions
    {
        //this dictionary should use weak key references
        static Dictionary<object, int> d = new Dictionary<object,int>();
        static int gid = 0;

        public static int GetId(this object o)
        {
            if (d.ContainsKey(o)) return d[o];
            return d[o] = gid++;
        }
    }   

You can choose what you will like to have as unique ID on your own, for instance, System.Guid.NewGuid() or simply integer for fastest access.

Ostmark answered 15/4, 2009 at 11:31 Comment(4)
Won't help if what you need this for is Dispose bugs, because this would prevent any kind of disposal.Endaendall
This doesn't quite work as the dictionary uses equality instead of identity, collapsing objects that return the same values for object.EqualsDurian
This will keep the object alive though.Exergue
@MartinLottering what if he uses ConditionalWeakTable<object, idType>?Brewmaster
P
7

How about this method:

Set a field in the first object to a new value. If the same field in the second object has the same value, it's probably the same instance. Otherwise, exit as different.

Now set the field in the first object to a different new value. If the same field in the second object has changed to the different value, it's definitely the same instance.

Don't forget to set field in the first object back to it's original value on exit.

Problems?

Pelag answered 22/10, 2011 at 22:3 Comment(0)
M
5

It is possible to make a unique object identifier in Visual Studio: In the watch window, right-click the object variable and choose Make Object ID from the context menu.

Unfortunately, this is a manual step, and I don't believe the identifier can be accessed via code.

Meraree answered 22/7, 2011 at 9:42 Comment(1)
What versions of Visual Studio have this feature? For example, the Express versions?Gratitude
N
3

You would have to assign such an identifier yourself, manually - either inside the instance, or externally.

For records related to a database, the primary key may be useful (but you can still get duplicates). Alternatively, either use a Guid, or keep your own counter, allocating using Interlocked.Increment (and make it large enough that it isn't likely to overflow).

Northwestward answered 15/4, 2009 at 9:44 Comment(0)
R
2

I know that this has been answered, but it's at least useful to note that you can use:

http://msdn.microsoft.com/en-us/library/system.object.referenceequals.aspx

Which will not give you a "unique id" directly, but combined with WeakReferences (and a hashset?) could give you a pretty easy way of tracking various instances.

Rascal answered 20/10, 2010 at 18:32 Comment(0)
R
2

If you are writing a module in your own code for a specific usage, majkinetor's method MIGHT have worked. But there are some problems.

First, the official document does NOT guarantee that the GetHashCode() returns an unique identifier (see Object.GetHashCode Method ()):

You should not assume that equal hash codes imply object equality.

Second, assume you have a very small amount of objects so that GetHashCode() will work in most cases, this method can be overridden by some types.
For example, you are using some class C and it overrides GetHashCode() to always return 0. Then every object of C will get the same hash code. Unfortunately, Dictionary, HashTable and some other associative containers will make use this method:

A hash code is a numeric value that is used to insert and identify an object in a hash-based collection such as the Dictionary<TKey, TValue> class, the Hashtable class, or a type derived from the DictionaryBase class. The GetHashCode method provides this hash code for algorithms that need quick checks of object equality.

So, this approach has great limitations.

And even more, what if you want to build a general purpose library? Not only are you not able to modify the source code of the used classes, but their behavior is also unpredictable.

I appreciate that Jon and Simon have posted their answers, and I will post a code example and a suggestion on performance below.

using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Runtime.Serialization;
using System.Collections.Generic;


namespace ObjectSet
{
    public interface IObjectSet
    {
        /// <summary> check the existence of an object. </summary>
        /// <returns> true if object is exist, false otherwise. </returns>
        bool IsExist(object obj);

        /// <summary> if the object is not in the set, add it in. else do nothing. </summary>
        /// <returns> true if successfully added, false otherwise. </returns>
        bool Add(object obj);
    }

    public sealed class ObjectSetUsingConditionalWeakTable : IObjectSet
    {
        /// <summary> unit test on object set. </summary>
        internal static void Main() {
            Stopwatch sw = new Stopwatch();
            sw.Start();
            ObjectSetUsingConditionalWeakTable objSet = new ObjectSetUsingConditionalWeakTable();
            for (int i = 0; i < 10000000; ++i) {
                object obj = new object();
                if (objSet.IsExist(obj)) { Console.WriteLine("bug!!!"); }
                if (!objSet.Add(obj)) { Console.WriteLine("bug!!!"); }
                if (!objSet.IsExist(obj)) { Console.WriteLine("bug!!!"); }
            }
            sw.Stop();
            Console.WriteLine(sw.ElapsedMilliseconds);
        }


        public bool IsExist(object obj) {
            return objectSet.TryGetValue(obj, out tryGetValue_out0);
        }

        public bool Add(object obj) {
            if (IsExist(obj)) {
                return false;
            } else {
                objectSet.Add(obj, null);
                return true;
            }
        }

        /// <summary> internal representation of the set. (only use the key) </summary>
        private ConditionalWeakTable<object, object> objectSet = new ConditionalWeakTable<object, object>();

        /// <summary> used to fill the out parameter of ConditionalWeakTable.TryGetValue(). </summary>
        private static object tryGetValue_out0 = null;
    }

    [Obsolete("It will crash if there are too many objects and ObjectSetUsingConditionalWeakTable get a better performance.")]
    public sealed class ObjectSetUsingObjectIDGenerator : IObjectSet
    {
        /// <summary> unit test on object set. </summary>
        internal static void Main() {
            Stopwatch sw = new Stopwatch();
            sw.Start();
            ObjectSetUsingObjectIDGenerator objSet = new ObjectSetUsingObjectIDGenerator();
            for (int i = 0; i < 10000000; ++i) {
                object obj = new object();
                if (objSet.IsExist(obj)) { Console.WriteLine("bug!!!"); }
                if (!objSet.Add(obj)) { Console.WriteLine("bug!!!"); }
                if (!objSet.IsExist(obj)) { Console.WriteLine("bug!!!"); }
            }
            sw.Stop();
            Console.WriteLine(sw.ElapsedMilliseconds);
        }


        public bool IsExist(object obj) {
            bool firstTime;
            idGenerator.HasId(obj, out firstTime);
            return !firstTime;
        }

        public bool Add(object obj) {
            bool firstTime;
            idGenerator.GetId(obj, out firstTime);
            return firstTime;
        }


        /// <summary> internal representation of the set. </summary>
        private ObjectIDGenerator idGenerator = new ObjectIDGenerator();
    }
}

In my test, the ObjectIDGenerator will throw an exception to complain that there are too many objects when creating 10,000,000 objects (10x than in the code above) in the for loop.

Also, the benchmark result is that the ConditionalWeakTable implementation is 1.8x faster than the ObjectIDGenerator implementation.

Reflux answered 7/7, 2015 at 12:19 Comment(0)
I
1

The information I give here is not new, I just added this for completeness.

The idea of this code is quite simple:

  • Objects need a unique ID, which isn't there by default. Instead, we have to rely on the next best thing, which is RuntimeHelpers.GetHashCode to get us a sort-of unique ID
  • To check uniqueness, this implies we need to use object.ReferenceEquals
  • However, we would still like to have a unique ID, so I added a GUID, which is by definition unique.
  • Because I don't like locking everything if I don't have to, I don't use ConditionalWeakTable.

Combined, that will give you the following code:

public class UniqueIdMapper
{
    private class ObjectEqualityComparer : IEqualityComparer<object>
    {
        public bool Equals(object x, object y)
        {
            return object.ReferenceEquals(x, y);
        }

        public int GetHashCode(object obj)
        {
            return RuntimeHelpers.GetHashCode(obj);
        }
    }

    private Dictionary<object, Guid> dict = new Dictionary<object, Guid>(new ObjectEqualityComparer());
    public Guid GetUniqueId(object o)
    {
        Guid id;
        if (!dict.TryGetValue(o, out id))
        {
            id = Guid.NewGuid();
            dict.Add(o, id);
        }
        return id;
    }
}

To use it, create an instance of the UniqueIdMapper and use the GUID's it returns for the objects.


Addendum

So, there's a bit more going on here; let me write a bit down about ConditionalWeakTable.

ConditionalWeakTable does a couple of things. The most important thing is that it doens't care about the garbage collector, that is: the objects that you reference in this table will be collected regardless. If you lookup an object, it basically works the same as the dictionary above.

Curious no? After all, when an object is being collected by the GC, it checks if there are references to the object, and if there are, it collects them. So if there's an object from the ConditionalWeakTable, why will the referenced object be collected then?

ConditionalWeakTable uses a small trick, which some other .NET structures also use: instead of storing a reference to the object, it actually stores an IntPtr. Because that's not a real reference, the object can be collected.

So, at this point there are 2 problems to address. First, objects can be moved on the heap, so what will we use as IntPtr? And second, how do we know that objects have an active reference?

  • The object can be pinned on the heap, and its real pointer can be stored. When the GC hits the object for removal, it unpins it and collects it. However, that would mean we get a pinned resource, which isn't a good idea if you have a lot of objects (due to memory fragmentation issues). This is probably not how it works.
  • When the GC moves an object, it calls back, which can then update the references. This might be how it's implemented judging by the external calls in DependentHandle - but I believe it's slightly more sophisticated.
  • Not the pointer to the object itself, but a pointer in the list of all objects from the GC is stored. The IntPtr is either an index or a pointer in this list. The list only changes when an object changes generations, at which point a simple callback can update the pointers. If you remember how Mark & Sweep works, this makes more sense. There's no pinning, and removal is as it was before. I believe this is how it works in DependentHandle.

This last solution does require that the runtime doesn't re-use the list buckets until they are explicitly freed, and it also requires that all objects are retrieved by a call to the runtime.

If we assume they use this solution, we can also address the second problem. The Mark & Sweep algorithm keeps track of which objects have been collected; as soon as it has been collected, we know at this point. Once the object checks if the object is there, it calls 'Free', which removes the pointer and the list entry. The object is really gone.

One important thing to note at this point is that things go horribly wrong if ConditionalWeakTable is updated in multiple threads and if it isn't thread safe. The result would be a memory leak. This is why all calls in ConditionalWeakTable do a simple 'lock' which ensures this doesn't happen.

Another thing to note is that cleaning up entries has to happen once in a while. While the actual objects will be cleaned up by the GC, the entries are not. This is why ConditionalWeakTable only grows in size. Once it hits a certain limit (determined by collision chance in the hash), it triggers a Resize, which checks if objects have to be cleaned up -- if they do, free is called in the GC process, removing the IntPtr handle.

I believe this is also why DependentHandle is not exposed directly - you don't want to mess with things and get a memory leak as a result. The next best thing for that is a WeakReference (which also stores an IntPtr instead of an object) - but unfortunately doesn't include the 'dependency' aspect.

What remains is for you to toy around with the mechanics, so that you can see the dependency in action. Be sure to start it multiple times and watch the results:

class DependentObject
{
    public class MyKey : IDisposable
    {
        public MyKey(bool iskey)
        {
            this.iskey = iskey;
        }

        private bool disposed = false;
        private bool iskey;

        public void Dispose()
        {
            if (!disposed)
            {
                disposed = true;
                Console.WriteLine("Cleanup {0}", iskey);
            }
        }

        ~MyKey()
        {
            Dispose();
        }
    }

    static void Main(string[] args)
    {
        var dep = new MyKey(true); // also try passing this to cwt.Add

        ConditionalWeakTable<MyKey, MyKey> cwt = new ConditionalWeakTable<MyKey, MyKey>();
        cwt.Add(new MyKey(true), dep); // try doing this 5 times f.ex.

        GC.Collect(GC.MaxGeneration);
        GC.WaitForFullGCComplete();

        Console.WriteLine("Wait");
        Console.ReadLine(); // Put a breakpoint here and inspect cwt to see that the IntPtr is still there
    }
Ipsus answered 7/1, 2014 at 10:21 Comment(8)
A ConditionalWeakTable might be better, since it would only persist the representations for objects while references existed to them. Also, I'd suggest that an Int64 might be better than a GUID, since it would allow objects to be given a persistent rank. Such things may be useful in locking scenarios (e.g. one may avoid deadlock if one all code which will need to acquire multiple locks does so in some defined order, but for that to work there must be a defined order).Triste
@Triste Sure about the longs; it depends on your scenario - in f.ex. distributed systems it's sometimes more useful to work with GUIDs. As for ConditionalWeakTable: you're right; DependentHandle checks for aliveness (NOTE: only when the thing resizes!), which can be useful here. Still, if you need performance the locking can become an issue there, so in that case it might be interesting to use this... to be honest I personally dislike the implementation of ConditionalWeakTable, which probably leads to my bias of using a simple Dictionary - even though you're correct.Ipsus
I've long been curious about how ConditionalWeakTable actually works. The fact that it only allows items to be added makes me think that it's designed to minimize concurrency-related overhead, but I have no idea how it works internally. I do find it curious that there's no simple DependentHandle wrapper which doesn't use a table, since there are definitely times when it's important to ensure that one object is kept alive for the lifetime of another, but the latter object has no room for a reference to the first.Triste
@Triste I'll post an addendum on how I think it works.Ipsus
The ConditionalWeakTable does not allow entries which have been stored in the table to be modified. As such, I would think that it could be implemented safely using memory barriers but not locks. The only problematic situation would be if two threads tried to add the same key simultaneously; that could be resolved by having the "add" method perform a memory barrier after an item is added, and then scanning to ensure that exactly one item has that key. If multiple items have the same key, one of them will be identifiable as "first", so it will be possible to eliminate the others.Triste
@Triste Actually they could have simply used a ConcurrentDictionary or used a hash with a linked list of buckets and add new entries using Interlocked.Exchange (just to name two possibilities). Only the resize operations on the bucket list needs a lock. The point that I don't like the ConditionalWeakTable has a lot to do with their hash implementation, even though I can understand why they did it like this. Still, we're drifting... does this give you some insights on your question on how ConditionalWeakTable works?Ipsus
I would have expected that .NET 4 included a "magical" ephemeron which took care of any necessary pinning itself. Are you saying CWT has to do that?Triste
@Triste CWT takes care of the free calls of the ephemeron handles; the GC takes care of the object garbage collection (usually before the handles are freed since that only happens during the resize calls). I think only the handles are "pinned" (actually, I think they're pointers in some runtime table like the GC); the objects themselves don't have to be pinned. The free calls themselves can be found in DependentHandle btw and are called during the resize phase.Ipsus

© 2022 - 2024 — McMap. All rights reserved.