python resettable instance method memoization decorator
Asked Answered
B

3

11

I'm attempting to build a decorator for an instance method of a class that will memoize the result. (This has been done a million times before) However, I'd like the option of being able to reset the memoized cache at any point (say, if something in the instance state changes, which might change the result of the method having nothing to do with its args). So, I attempted to build a decorator as a class instead of a function, so that I might have access to the cache as a class member. This led me down the path of learning about descriptors, specifically the __get__ method, which is where I'm actually stuck. My code looks like so:

import time

class memoized(object):

    def __init__(self, func):
        self.func = func
        self.cache = {}

    def __call__(self, *args, **kwargs):

        key = (self.func, args, frozenset(kwargs.iteritems()))

        try:
            return self.cache[key]
        except KeyError:
            self.cache[key] = self.func(*args, **kwargs)
            return self.cache[key]
        except TypeError:
            # uncacheable, so just return calculated value without caching
            return self.func(*args, **kwargs)

    # self == instance of memoized
    # obj == instance of my_class
    # objtype == class object of __main__.my_class
    def __get__(self, obj, objtype=None):
        """Support instance methods"""
        if obj is None:
            return self

        # new_func is the bound method my_func of my_class instance
        new_func = self.func.__get__(obj, objtype)

        # instantiates a brand new class...this is not helping us, because it's a 
        # new class each time, which starts with a fresh cache
        return self.__class__(new_func)

    # new method that will allow me to reset the memoized cache
    def reset(self):
        print "IN RESET"
        self.cache = {}

class my_class:
    @memoized
    def my_func(self, val):
        print "in my_func"
        time.sleep(2)
        return val


c = my_class()

print "should take time"
print c.my_func(55)
print

print "should be instant"
print c.my_func(55)
print

c.my_func.reset()

print "should take time"
print c.my_func(55)

Is this clear and/or possible? Each time __get__ is called, I get a brand new instance of the memoized class, which loses me the cache with actual data in it. I've been working hard with __get__, but am not making much progress.

Is there a completely separate approach to this problem that I'm completely missing? And and all advice/suggestions are welcome and appreciated. Thanks.

Baa answered 13/12, 2010 at 17:38 Comment(5)
I didn't go deeper in your code, but for better performance you should use if cache.has_key(...): return cache[...] instead of catching KeyError.Selfconfessed
@khachik: key in cache is better, since has_key is deprecated.Piperonal
Keep in mind that the point of memoization is that a function call with the same arguments produces the same output. If you really need to reset the cache based on instance state, maybe you should consider keeping the cache in the instance instead of a memoized decorator.Subdebutante
@Selfconfessed @delnan - yeah, I should have posted it fully cleaned up. I was just trying to get the concept down. Sorry. @Subdebutante - I realize this is conceptually awkward, but I was hoping to just be able to add a decorator (1 line), instead of doing all the cache checking (say, 5 lines) per method it's used in. Perhaps I'll just go back and figure out how to do it succinctly in the instance. Thanks to all!Baa
@Falmarri: I think I'd have to disagree with that assessment. There are plenty of applications where you want to memoize functions based upon some application context and then invalidate the caches when you've left the context (e.g. database queries, where the application might disconnect from the DB and connect to a new one). Nobody ever said that memoization had to remain valid for the whole application run. It just has to remain valid long enough to give you a significant performance benefit.Wishful
P
8

Rather than trying to work out the mechanics of your implementation, I've taken the memoized decorator class from PythonDecoratorLibrary, and have modified it to add reset. Below is the result; the trick I've used is to add a callable reset attribute to the decorated function itself.

    class memoized2(object):
       """Decorator that caches a function's return value each time it is called.
       If called later with the same arguments, the cached value is returned, and
       not re-evaluated.
       """
       def __init__(self, func):
          self.func = func
          self.cache = {}
       def __call__(self, *args):
          try:
             return self.cache[args]
          except KeyError:
             value = self.func(*args)
             self.cache[args] = value
             return value
          except TypeError:
             # uncachable -- for instance, passing a list as an argument.
             # Better to not cache than to blow up entirely.
             return self.func(*args)
       def __repr__(self):
          """Return the function's docstring."""
          return self.func.__doc__
       def __get__(self, obj, objtype):
          """Support instance methods."""
          fn = functools.partial(self.__call__, obj)
          fn.reset = self._reset
          return fn
       def _reset(self):
          self.cache = {}


    class my_class:
        @memoized2
        def my_func(self, val):
            print "in my_func"
            time.sleep(2)
            return val


    c = my_class()

    print "should take time"
    print c.my_func(55)
    print

    print "should be instant"
    print c.my_func(55)
    print

    c.my_func.reset()

    print "should take time"
    print c.my_func(55)
Pronunciation answered 13/12, 2010 at 17:59 Comment(11)
Thanks! I had tried the functools approach, but didn't have the experience to realize what it was actually doing. I realize the whole idea is kinda awkward, so thanks for your help!Baa
This may work, but it looks very inefficient to me given what __get__() is doing on every reference to the memoized method -- which is somewhat ironic considering the intended usage...Hymn
@Hymn - In the interest of education, why would you say the __get__ is inefficient? (Genuine curiousity, as a beginner)Baa
@Hoopes: OK sure. I said that because on every access of the memoized method before any caching has a chance to take place, it calls functools.partial() which creates a function that will be called and then, just in case, it also creates and assigns self._reset to an attribute of that newly created function so it'll be there if called. I think this may be an inherent limitation of trying to cache things via the descriptor protocol which doesn't know why or for what purpose you're getting, setting, or deleting its logical value.Hymn
@martineau, please correct me if I'm wrong, but methinks there's an easy fix: after building above partial, put it in a self.__wrapper, and on any subsequent __get__ calls return the cached self.__wrapper. Seems to be working for me. No?Mobcap
@K3---rnc: Good thought. It does work and reduces the overhead of calling a memoized method significantly. BTW I see no real need for double (or even a single) leading underscore(s) on any new "wrapper" attribute's name.Hymn
@K3---rnc: Also, BTW, I find "methinks" just a really pretentious and annoying way to say "I think."Hymn
@martineau, I agree. I just like to prefix "internal" attributes with an underscore as I hear this is sort of a convention. Also, I'm not a native English speaker, and I find archaic oddities such as 'methinks' a welcome addition to what is otherwise a dull, logical language. :DMobcap
@K3---rnc: I heare what saith ye. Me understandth thou dost desire to use a leading underscore although using such wouldst be inconsistent with the existing code writ here. Moreover using two of them has a particular meaning in Python which does not apply here, and could even be considered misleading.Hymn
How would you access the reset method if @memoized2 was used on a @property, i.e. @property\n@memoized2\ndef myprop(self):?Tommie
Please note, this will reset the cache for all instances of "my_class".Gagger
W
4

Building upon the answer to the original question given by @aix I have created a class that I think could improve it. The main feature is that the cached values are stored as a property of the instance whose method is being decorated, hence it is very easy to reset them.

class memoize(object):
  def __init__(self, func):
    #print "Init"
    self.func = func

  def __call__(self, *args):
    #print "Call"
    if not self.func in self.cache:
        self.cache[self.func] = {}
    try:
        return self.cache[self.func][args]
    except KeyError:
        value = self.func(*args)
        self.cache[self.func][args] = value
        return value
    except TypeError:
        # uncachable -- for instance, passing a list as an argument.
        # Better to not cache than to blow up entirely.
        return self.func(*args)

  def __repr__(self):
    """Return the function's docstring."""
    return self.func.__doc__

  def __get__(self, obj, objtype):
    """Support instance methods."""
    #print "Get", obj, objtype
    fn = functools.partial(self.__call__, obj)
    try:
        self.cache = obj.cache
    except:
        obj.cache = {}
        self.cache = obj.cache
    #print self.cache
    return fn

As an example of usage:

class MyClass(object):
    def __init__(self,data):
        self.data = data

    def update(self,data):
        self.data = data
        self.cache = {}

    @memoize
    def func1(self,x):
        print "Computing func1"
        return "I am func1 of %s. Data is %s. x is %s\n" % (self, self.data, x)

    @memoize
    def func2(self,x):
        print "Computing func2"
        return "I am func2 of %s. Data is %s. x is %s\n" % (self, self.data, x)

    def func3(self,x):
        print "Computing func3"
        return "I am func3 of %s. Data is %s. x is %s\n" % (self, self.data, x)

mc1 = MyClass("data1")
mc2 = MyClass("data2")
mc3 = MyClass("data3")

print mc1.func1(1) 
print mc1.func1(1) 
print mc1.func2(1) 
print mc1.func2(1) 
print mc1.func3(1) 
print mc1.func3(1) 

print mc2.func1(1) 
print mc2.func1(1) 
print mc2.func2(1) 
print mc2.func2(1) 
print mc2.func3(1) 
print mc2.func3(1) 

print "Update mc1\n"
mc1.update("data1new")

print mc1.func1(1) 
print mc1.func2(1) 
print mc1.func3(1) 
print mc2.func1(1) 
print mc2.func2(1) 
print mc2.func3(1) 

gets as output:

Computing func1
I am func1 of <__main__.MyClass object at 0x100470fd0>. Data is data1. x is 1

I am func1 of <__main__.MyClass object at 0x100470fd0>. Data is data1. x is 1

Computing func2
I am func2 of <__main__.MyClass object at 0x100470fd0>. Data is data1. x is 1

I am func2 of <__main__.MyClass object at 0x100470fd0>. Data is data1. x is 1

Computing func3
I am func3 of <__main__.MyClass object at 0x100470fd0>. Data is data1. x is 1

Computing func3
I am func3 of <__main__.MyClass object at 0x100470fd0>. Data is data1. x is 1

Computing func1
I am func1 of <__main__.MyClass object at 0x100476050>. Data is data2. x is 1

I am func1 of <__main__.MyClass object at 0x100476050>. Data is data2. x is 1

Computing func2
I am func2 of <__main__.MyClass object at 0x100476050>. Data is data2. x is 1

I am func2 of <__main__.MyClass object at 0x100476050>. Data is data2. x is 1

Computing func3
I am func3 of <__main__.MyClass object at 0x100476050>. Data is data2. x is 1

Computing func3
I am func3 of <__main__.MyClass object at 0x100476050>. Data is data2. x is 1

Update mc1

Computing func1
I am func1 of <__main__.MyClass object at 0x100470fd0>. Data is data1new. x is 1

Computing func2
I am func2 of <__main__.MyClass object at 0x100470fd0>. Data is data1new. x is 1

Computing func3
I am func3 of <__main__.MyClass object at 0x100470fd0>. Data is data1new. x is 1

I am func1 of <__main__.MyClass object at 0x100476050>. Data is data2. x is 1

I am func2 of <__main__.MyClass object at 0x100476050>. Data is data2. x is 1

Computing func3
I am func3 of <__main__.MyClass object at 0x100476050>. Data is data2. x is 1
Whitsun answered 25/12, 2011 at 11:44 Comment(0)
S
0

Well, I would like to point out two performance issues in your code. This is not an answer to your question, but I can't make it a comment. Thanks to @delnan for pointing out that has_key is deprecated. Instead of:

    try:
        return self.cache[key]
    except KeyError:
        self.cache[key] = self.func(*args, **kwargs)
        return self.cache[key]
    except TypeError:
        # uncacheable, so just return calculated value without caching
        return self.func(*args, **kwargs)

I would make it this way:

resultDone = False
result = None
try:
  if key in self.cache: return self.cache[key]
  else:
    result = self.func(*args, **kwargs)
    resultDone = True
    self.cache[key] = result
except TypeError: # unhashable key
  pass
if resultDone: return result
else: return self.func(*args, **kwargs)

This avoids: a) try/except KeyError; b) calling cache[key] on return; c) calling the function once more on unhashable keys.

Selfconfessed answered 13/12, 2010 at 18:11 Comment(2)
You're wrong about try/except being a performance issue. It's true that exceptions are expensive, but only if they actually trigger: ideone.com/8lXyo In case of memorization, one would expect cache hits be far more frequent then misses. In case of a cache miss, costs of recalculation most likely supersede the cost of catching an exception.Laxity
@Laxity I just said that try/except KeyError is slower than has_key or key in, called it once or 1000 times. I'm not informed about data distribution in the OP domain, sorry :)Selfconfessed

© 2022 - 2024 — McMap. All rights reserved.