Is it possible to create a truely weak-keyed dictionary in C#?
Asked Answered
G

1

27

I'm trying to nut out the details for a true WeakKeyedDictionary<,> for C#... but I'm running into difficulties.

I realise this is a non-trivial task, but the seeming inability to declare a WeakKeyedKeyValuePair<,> (where the GC only follows the value reference if the key is reachable) makes it seemingly impossible.

There are two main problems I see:

  1. Every implementation I've so far seen does not trim values after keys have been collected. Think about that - one of the main reasons for using such a Dictionary is to prevent those values being kept around (not just the keys!) as they're unreachable, but here they are left pointed to by strong references.

    Yes, add/remove from the Dictionary enough and they'll eventually be replaced, but what if you don't?

  2. Without a hypothetical WeakKeyedKeyValuePair<,> (or another means of telling the GC to only mark the value if the key is reachable) any value that refers to it's key would never be collected. This is a problem when storing arbitrary values.

Problem 1 could be tackled in a fairly non-ideal/hackish way : use GC Notifications to wait for a full GC to complete, and then go along and prune the dictionary in another thread. This one I'm semi-ok with.

But problem 2 has me stumped. I realise this is easily countered by a "so don't do that", but it has me wondering - is this problem even possible to solve?

Gesualdo answered 9/12, 2011 at 4:26 Comment(0)
A
33

Have a look at the ConditionalWeakTable<TKey, TValue> Class.

Enables compilers to dynamically attach object fields to managed objects.

It's essentially a dictionary where both the key and the value are a WeakReference, and the value is kept alive as long as the key is alive.

Note! This class does not use GetHashCode and Equals to do equality comparisons, it uses ReferenceEquals.

Aslant answered 9/12, 2011 at 4:45 Comment(29)
Nice find, most intriguing! How is it implemented I wonder? Without knowing this the problem becomes "is it possible to create a truly WeakValuedDictionary<,>". I'll dig in to reflector and see if I can figure it out...Gesualdo
What a shame, it appears it depends on an internal .NET-magic "DependentHandle" for it's implementation.. additionally it ignores .GetHashCode and .Equals, making it a substandard Dictionary at best :(. Additionally without access to DependentHandle, the problem has now moved to defining a WeakValuedDictionary<,>. I suppose this may be as close as we can get..Gesualdo
@Gesualdo The DependentHandle is the CLR implementation of ephemerons which is impossible to implement without GC cooperation. It should have been made public like GCHandle. If you don't mind reflection trickery you can turn the static CLR methods into delegates and implement your own DependentHandle and WeakValuedDictionary. Be careful to study the .NET reference source (which is public), or use some decompiler, because it has tricky race conditions.Gish
@Zarat: If one wants to create a single "life-dependency-link" between two objects X and Y (such that the link will be considered as a strong reference of X only as long as a strong reference to Y exists), is there a nicer way to do that than creating a single-item ConditionalWeakTable?Phytosociology
@supercat: Yes, that's basically what ephemerons and the DependentHandle does, just create an instance with X and Y as parameters, however whether it is a 'nice' way depends on your opinion on reflection into the .NET implementation. (Of course thats not allowed in partially trusted applications.)Gish
@Zarat: Are you saying that DependentHandle is the native implementation of an ephemeron, trusted code might gain efficiency by using that, but untrusted code should just create a ConditionalWeakTable with one or two items in it (the desired linkage, plus possibly a linkage from the upstream side of the desired linkage table itself)? Also, I was curious what is guaranteed with respect to the relative behavior of ConditionalWeakTable and "fReachable" objects? Will objects which are freachable (but not strongly reachable) on one side of a ConditionalWeakTable be likewise on the other?Phytosociology
@supercat: Yes that's what I meant. Not sure what you mean with 'fReachable', to me ephemerons are just a way to tell the GC that an object is conditionally reachable: IF A is (strongly) reachable THEN B must also be marked (strongly) reachable. [I hope I didn't get that wrong, it's a while ago since I looked at that stuff.]Gish
@Zarat: An object is freachable if no strongly rooted references to it exist but it either has a registered finalizer, or it is reachable from an object with a registered finalizer. If an object is freachable, then in the next GC cycle it will not be eligible for collection, but if it has a finalizer it will be placed in the "freachable queue", which is a strongly-rooted list of objects whose finalizers should be run at earliest convenience. One of the constructors for WeakReference contains a bool parameter which indicates whether the WeakReference should be invalidated when...Phytosociology
...its target becomes freachable, or only when its target ceases to exist. Generally, once an object becomes freachable, one should endeavor to replace it rather than reuse it, so having a WeakReference become invalid at that point is desirable. On the other hand, situations might arise where one needs to expedite the cleanup of an freachable object (e.g. because one wishes to create a new object using the same hardware resource). For that, one would use a "long" WeakReference. It would be interesting to know which behaviors ephemerons support.Phytosociology
@supercat: Ah, I see, just didn't know this under the term 'freachable'. A quick test shows that DependentHandle and ConditionalWeakTable are implemented as long weak references, they are not reset during the first collection pass. (My test: resurrect the object from the finalizer, it's still accessible in ConditionalWeakTable and long weak references, but the short weak reference has been cleared.)Gish
@Zarat: Which object were you resurrecting--the 'origin' or 'target' side of the link? Also, what sort of reference existed to the ConditionalWeakTable itself? The corner cases I'm interested in would be things like what happens if table Tab holds links from A to Tab and B to C (and there are no other references to the table). If A becomes freachable, but B is strongly reachable, what is the status of C? If a WeakReference becomes freachable, its target will be invalidated regardless of whether it's a long or short reference, but I don't know about a CWT.Phytosociology
@Zarat: BTW, one thing I'd like to see as a framework class would be an InternTable<T> (with an IEqualityComparer<T> as a construction parameter). Given an instance of T, determine whether the table already holds an entry which would compare equal. If so, compare that instance; otherwise, store the supplied instance in the table and return it. Such a table, if implemented well, could greatly improve the performance of nested immutable types where hash codes can pretty reliably distinguish unequal instances, but where comparing equal instances is expensive.Phytosociology
@Zarat: Using the InternTable<T> would ensure that references to instances with equal contents would be replaced with references to the same instances. If one wishes to perform many comparisons on things like trees and subtrees which are built separately but will have many equal parts, such replacements may lead to tremendous efficiency improvements. Note that there is no benefit to keeping an object interned after all other references have disappeared, even when equivalent objects exist.Phytosociology
@supercat: (1) I resurrected the key. (2) The reachability of the table is irrelevant, except if it gets finalized it is cleared, otherwise it just acts as a list. (3) A weak reference gets cleared when freachable because its finalizer is run. Here the finalizer is on the table so it has no effect on ephemerons unless the table is collected and cleared. (4) InternTable can probably be implemented with a CWT, but more efficiently if you use reflection and ephemerons directly.Gish
@supercat: Thinking we're getting a bit off topic here :) If you want to discuss further you can do by mail, or on my blog. I posted my reflection trickery to reimplement DependentHandle and a new Ephemeron class analogue to what WeakReference is to GCHandle.Gish
I posted a question awhile ago about an intern table at #10924182 so if you'd like to discuss anything there, I'd like more thoughts on the subject. I would think that a weak-interning collection could greatly enhance the value of many immutable types. Comparing two trees which contain the same items but were built separately, for example, takes time O(n) every time they're compared. If the trees implement a good GetHashCode() function, however, and if code which builds trees interns them...Phytosociology
...(including subtrees), then the time for comparison would be O(1) [since each tree would not just hold identical subtrees, but the same subtrees]. I'm not sure what the most efficient way to build a WeakInterning class would be, however, and so would welcome your thoughts on that other post.Phytosociology
Perhaps we could chat about such things?Phytosociology
I've been following this discussion with interest. I suggest you continue the discussion somewhere else and post a self-answered question with your findings. Please provide a link here so I can continue to follow this topic. Thanks!Aslant
If identity comparison cannot be used then ConditionalWeakTable is not an option. In this case I would suggest our implementation WeakTable.cs, and our description in the blog WeakTable.Marybelle
@VladimirNesterovsky I have checked your code and you access concurrent dictionary in a finalizer. This exposes you to a rare race condition that can result in a deadlock.Majorette
@Majorette I'm not sure you're correct in your reasoning. Finalizer is called inside dedicated thread (it's sometimes called FinalizerWorker). This thread is not different from any other thread, and does not prevent any other thread to progress. I guess you have confused FinalizerWorker thread with Garbage Collector thread that stops all other threads. If one would be able to lock something in GC thread then it could lead to dead lock.Marybelle
@VladimirNesterovsky If you take a look at ConcurrentDictionary.TryRemove, you will see it uses lock. Let's say you have only single thread in your app, and you try to remove something from WeakTable. GC kicks in while you are inside that TryRemove lock. In your destructor, you block on that lock, and since your main thread is suspended, it will never unlock and block forever.Majorette
@Majorette my point is that the FinilizerWorker and GC are two different threads. Nothing is stopped or blocked while FinilizerWorker is working. FinilizerWorker is a thread where all Finalizers are queued to execute by GC thread. In contrast GC thead may decide to "stop the world" to trace graph of objects for the collection.Marybelle
@VladimirNesterovsky It seems you are correct, although the internet is full of people screaming to never lock in a finalizer. I guess you can't lock onto anything to cause GC thread (not finalizer thread) deadlock, can you?Majorette
@Majorette That's my understanding. GC thread is implemented by the runtime environment, and no user code is run in its context. In fact it can be possible that there is no dedicated GC thread at all, as runtime can peek anyone, and stop all others.Marybelle
@VladimirNesterovsky I feel like I found another race condition - please let me know what you think: I call Remove("asdf") on thread 1, and the thread gets just before the line states.Remove(state.key); where it gets raced by thread 2, whcih calls GetOrAdd("asdf"). Because thread 1 did not remove the key from ConditionalWeakTable yet, and thread 2 will attempt to bind another object to the same key using ConditionalWeakTable, you will get an exception.Majorette
@Paya, race condition is not an evil, the only problem is to keep invariants. When thread 1 gets to the line states.Remove(state.key) the state.initialized == 2 (parked). Thread 2 in GetOrAdd will either a) find and return old value; b) return a new value. If a new state is uninitialized (state.initialized == 0), then it's marked as initialized (state.initialized == 1) and added to states. If it's observed that the state is been parked (state.initialized == 2) after the states.Add(key, state) then state is removed from states. In your case GetOrAdd will return the old value.Marybelle
Let us continue this discussion in chat.Marybelle

© 2022 - 2024 — McMap. All rights reserved.