Python copy-on-write behavior
Asked Answered
P

3

8

I'm working on a problem where I'm instantiating many instances of an object. Most of the time the instantiated objects are identical. To reduce memory overhead, I'd like to have all the identical objects point to the same address. When I modify the object, though, I'd like a new instance to be created--essentially copy-on-write behavior. What is the best way to achieve this in Python?

The Flyweight Pattern comes close. An example (from http://codesnipers.com/?q=python-flyweights):

import weakref

class Card(object):
    _CardPool = weakref.WeakValueDictionary()
    def __new__(cls, value, suit):
        obj = Card._CardPool.get(value + suit, None)
        if not obj:
            obj = object.__new__(cls)
            Card._CardPool[value + suit] = obj
            obj.value, obj.suit = value, suit
        return obj

This behaves as follows:

>>> c1 = Card('10', 'd')
>>> c2 = Card('10', 'd')
>>> id(c1) == id(c2)
True
>>> c2.suit = 's'
>>> c1.suit
's'
>>> id(c1) == id(c2)
True

The desired behavior would be:

>>> c1 = Card('10', 'd')
>>> c2 = Card('10', 'd')
>>> id(c1) == id(c2)
True
>>> c2.suit = 's'
>>> c1.suit
'd'
>>> id(c1) == id(c2)
False

Update: I came across the Flyweight Pattern and it seemed to almost fit the bill. However, I'm open to other approaches.

Placate answered 10/9, 2012 at 20:58 Comment(4)
I don't think it's possible with your exact example. c1 and c2 are the same object. When you set an attribute on one, there is no way to make it turn into another object, without you fetching a new instance and letting the class give you a new copy. Might involve a slightly different approach, involving a bunch of __setattr__ magic.Lyssa
Use a wrapper around the card. Read operations look at the current card, write operations will change the card referenced. You would need to mess with a lot of operations for the syntax to be like you want. The closest analogue I can think of is that you are trying to implement a pointer to a pointer.Nelidanelie
@Lyssa That's what I suspected. Has anyone done this (or something very similar)? Are there examples available?Placate
Perhaps you can just use a more efficient data structure, such as an array of packed integers or interned strings.Blanchard
P
7

Do you need id(c1)==id(c2) to be identical, or is that just a demonstration, where the real objective is avoiding creating duplicated objects?

One approach would be to have each object be distinct, but hold an internal reference to the 'real' object like you have above. Then, on any __setattr__ call, change the internal reference.

I've never done __setattr__ stuff before, but I think it would look like this:

class MyObj:
    def __init__(self, value, suit):
        self._internal = Card(value, suit)

    def __setattr__(self, name, new_value):
        if name == 'suit':
            self._internal = Card(value, new_value)
        else:
            self._internal = Card(new_value, suit)

And similarly, expose the attributes through getattr.

You'd still have lots of duplicated objects, but only one copy of the 'real' backing object behind them. So this would help if each object is massive, and wouldn't help if they are lightweight, but you have millions of them.

Pustule answered 10/9, 2012 at 21:18 Comment(4)
I think this follows the main comment made by @StephenGarle. Probably would work.Lyssa
I also believe this approach could lead to clean, sane, implementation, without resorting to any class black magic. (I kind of recent not needing black magic :-) )Outlast
Here is a suggestion for updating this example: pastebin.com/dqCTh9LA . Shows use of __getattr__ and a little more flexible __setattr__Lyssa
Take a look at github.com/diffoperator/pycow for how it can be done for list, set and dict.Nubbin
E
3

Impossible.

id(c1) == id(c2)

says that c1 and c2 are references to the exact same object. So

c2.suit = 's' is exactly the same as saying c1.suit = 's'.

Python has no way of distinguishing the two (unless you allow introspection of prior call frames, which leads to a dirty hack.)

Since the two assignments are identical, there is no way for Python to know that c2.suit = 's' should cause the name c2 to reference a different object.


To give you an idea of what the dirty hack would look like,

import traceback
import re
import sys
import weakref

class Card(object):
    _CardPool = weakref.WeakValueDictionary()
    def __new__(cls, value, suit):
        obj = Card._CardPool.get(value + suit, None)
        if not obj:
            obj = object.__new__(cls)
            Card._CardPool[value + suit] = obj
            obj._value, obj._suit = value, suit
        return obj
    @property
    def suit(self):
        return self._suit
    @suit.setter
    def suit(self, suit):
        filename,line_number,function_name,text=traceback.extract_stack()[-2]
        name = text[:text.find('.suit')]
        setattr(sys.modules['__main__'], name, Card(self._value, suit))

c1 = Card('10', 'd')
c2 = Card('10', 'd')
assert id(c1) == id(c2)

c2.suit = 's'
print(c1.suit)
# 'd'

assert id(c1) != id(c2)

This use of traceback only works with those implementations of Python that uses frames, such as CPython, but not Jython or IronPython.

Another problem is that

name = text[:text.find('.suit')]

is extremely fragile, and would screw up, for example, if the assignment were to look like

if True: c2.suit = 's'

or

c2.suit = (
    's')

or

setattr(c2, 'suit', 's')

Yet another problem is that it assumes the name c2 is global. It could just as easily be a local variable (say, inside a function), or an attribute (obj.c2.suit = 's').

I do not know a way to address all the ways the assignment could be made.

In any of these cases, the dirty hack would fail.

Conclusion: Don't use it. :)

Ectoderm answered 10/9, 2012 at 21:14 Comment(1)
Thanks for the thorough answer. It would have taken me a little while to hack that together, and I wouldn't have known all the cases where it fails. It looks like the solution proposed by @BrendenBrown gets closest to what I'm looking for.Placate
S
0

This is impossible in your current form. A name (c1 and c2 in your example) is a reference, and you can not simply change the reference by using __setattr__, not to mention all other references to the same object.

The only way this would be possible is something like this:

c1 = c1.changesuit("s")

Where c1.changesuit returns a reference to the (newly created) object. But this only works if each object is referenced by only one name. Alternatively you might be able to do some magic with locals() and stuff like that, but please - don't.

Selfpropulsion answered 10/9, 2012 at 21:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.