When to use weak references in Python?
Asked Answered
H

3

42

Can anyone explain usage of weak references?

The documentation doesn't explain it precisely, it just says that the GC can destroy the object linked to via a weak reference at any time. Then what's the point of having an object that can disappear at any time? What if I need to use it right after it disappeared?

Can you please explain them with some good examples?

Thanks

Hartzel answered 12/3, 2010 at 22:25 Comment(0)
O
28

The typical use for weak references is if A has a reference to B and B has a reference to A. Without a proper cycle-detecting garbage collector, those two objects would never get GC'd even if there are no references to either from the "outside". However if one of the references is "weak", the objects will get properly GC'd.

However, Python does have a cycle-detecting garbage collector (since 2.0!), so that doesn't count :)

Another use for weak references is for caches. It's mentioned in the weakref documentation:

A primary use for weak references is to implement caches or mappings holding large objects, where it’s desired that a large object not be kept alive solely because it appears in a cache or mapping.

If the GC decides to destroy one of those objects, and you need it, you can just recalculate / refetch the data.

Obannon answered 12/3, 2010 at 22:36 Comment(3)
Hmm... I didn't realize that you mentioned caches... I suppose I should delete my answer and read more carefully next time :-).Astragal
Well, it was just a one-liner mention of caches. I then expanded the answer, after you had posted yours :)Disincentive
Ok, well that at least I'm not crazy :-). It doesn't say your post was edited though. Either way.. I liked your answer after I read it again... and I don't want to come off as reputation-hungry :-).Astragal
C
48

Events are a common scenario for weak references.


Problem

Consider a pair of objects: Emitter and Receiver. The receiver has shorter lifetime than the emitter.

You could try an implementation like this:

class Emitter(object):

    def __init__(self):
        self.listeners = set()

    def emit(self):
        for listener in self.listeners:
            # Notify
            listener('hello')


class Receiver(object):

    def __init__(self, emitter):

        emitter.listeners.add(self.callback)

    def callback(self, msg):
        print 'Message received:', msg


e = Emitter()
l = Receiver(e)
e.emit() # Message received: hello

However, in this case, the Emitter keeps a reference to a bound method callback that keeps a reference to the Receiver. So the Emitter keeps the Receiver alive:

# ...continued...

del l
e.emit() # Message received: hello

This is sometimes troublesome. Imagine that Emitter is a part of some data model that indicates when data changes and Receiver was created by a dialog window that listens to that changes to update some UI controls.

Through the application's lifetime, multiple dialogs can be spawned and we don't want their receivers to be still registered inside the Emitter long after the window had been closed. That would be a memory leak.

Removing the callbacks manually is one option (just as troublesome), using weak references is another.


Solution

There's a nice class WeakSet that looks like a normal set but stores its members using weak references and no longer stores them when they are freed.

Excellent! Let's use it:

def __init__(self):
    self.listeners = weakref.WeakSet()

and run again:

e = Emitter()
l = Receiver(e)
e.emit()
del l
e.emit()

Oh, nothing happens at all! That's because the bound method (a specific receiver's callback) is orphaned now - neither the Emitter nor the Receiver hold a strong reference to it. Hence it's garbage collected immediately.

Let's make the Receiver (not the Emitter this time) keep a strong reference to this callback:

class Receiver(object):

    def __init__(self, emitter):

        # Create the bound method object
        cb = self.callback

        # Register it
        emitter.listeners.add(cb)
        # But also create an own strong reference to keep it alive
        self._callbacks = set([cb])

Now we can observe the expected behaviour: the Emitter only keeps the callback as long as the Receiver lives.

e = Emitter()
l = Receiver(e)
assert len(e.listeners) == 1

del l
import gc; gc.collect()
assert len(e.listeners) == 0

Under the hood

Note that I've had to put a gc.collect() here to make sure that the receiver is really cleaned up immediately. It's needed here because now there's a cycle of strong references: the bound method refers to the receiver and vice versa.

This isn't very bad; this only means that the receiver's cleanup will be deferred until the next garbage collector run. Cyclic references can't be cleaned up by the simple reference counting mechanism.

If you really want, you could remove the strong reference cycle by replacing the bound method with a custom function object that would keep its self as a weak reference too.

def __init__(self, emitter):

    # Create the bound method object
    weakself = weakref.ref(self)
    def cb(msg):
        self = weakself()
        self.callback(msg)

    # Register it
    emitter.listeners.add(cb)
    # But also create an own strong reference to keep it alive
    self._callbacks = set([cb])

Let's put that logic into a helper function:

def weak_bind(instancemethod):

    weakref_self = weakref.ref(instancemethod.im_self)
    func = instancemethod.im_func

    def callback(*args, **kwargs):
        self = weakref_self()
        bound = func.__get__(self)
        return bound(*args, **kwargs)

    return callback

class Receiver(object):

    def __init__(self, emitter):

        cb = weak_bind(self.callback)

        # Register it
        emitter.listeners.add(cb)
        # But also create an own strong reference to keep it alive
        self._callbacks = set([cb])

Now there's no cycle of strong references, so when Receiver is freed, the callback function will also be freed (and removed from the Emitter's WeakSet) immediately, without the need for a full GC cycle.

Carioca answered 17/2, 2013 at 14:30 Comment(5)
I don't understand the rationale behind Listener._callbacks set. Won't the callback be kept alive throughout the lifetime of Listener object anyway? It's a method to that object, after all.Cahan
Turns out it isn't. Listener.callback is a function, but l.callback calls yourmethod.__get__(l, Listener) that returns an instancemethod callable which remembers its self. It's always recreated when a method is looked up. Good reading hereCarioca
You probably don't want to keep references to this transient instancemetiond object then, but to the Listener object itself (Java-style).Cahan
After our discussion I've renamed Listener to Receiver, since the instancemethod object (or the callback from weak_bind) is actually what performs the Listener role here.Carioca
In "the Receiver keeps a reference ...", just after the first example, shouldn't that be "the Emitter keeps a reference ..."? +1, by the way.Velamen
O
28

The typical use for weak references is if A has a reference to B and B has a reference to A. Without a proper cycle-detecting garbage collector, those two objects would never get GC'd even if there are no references to either from the "outside". However if one of the references is "weak", the objects will get properly GC'd.

However, Python does have a cycle-detecting garbage collector (since 2.0!), so that doesn't count :)

Another use for weak references is for caches. It's mentioned in the weakref documentation:

A primary use for weak references is to implement caches or mappings holding large objects, where it’s desired that a large object not be kept alive solely because it appears in a cache or mapping.

If the GC decides to destroy one of those objects, and you need it, you can just recalculate / refetch the data.

Obannon answered 12/3, 2010 at 22:36 Comment(3)
Hmm... I didn't realize that you mentioned caches... I suppose I should delete my answer and read more carefully next time :-).Astragal
Well, it was just a one-liner mention of caches. I then expanded the answer, after you had posted yours :)Disincentive
Ok, well that at least I'm not crazy :-). It doesn't say your post was edited though. Either way.. I liked your answer after I read it again... and I don't want to come off as reputation-hungry :-).Astragal
P
1
  • Weak references is an important concept in python, which is missing in languages likes Java(java 1.5).
  • In Observer design pattern, generally Observable Object must maintain weak references to the Observer object.

    eg. A emits an event done() and B registers with A that, it want to listen to event done(). Thus, whenever done() is emitted, B is notified. But If B isn't required in application, then A must not become an hinderance in the garbage collection in A(since A hold the reference to B). Thus, if A has hold weak reference to B, and when all the references to A are away, then B will be garbage collected.

  • It's also very useful in implementing caches.
Pandemonium answered 30/10, 2016 at 16:58 Comment(1)
java.lang.ref.WeakReference was implemented in Java 1.2 to be precise. In fact, I'm a Java developer and I'm here because looking for a weak reference cache implementation in Python, a very common case for weak references BTW, that I couldn't find so far apparently because of Phyton's garbage collection model and default configuration.Rickyrico

© 2022 - 2024 — McMap. All rights reserved.