How to maintain state in Python without classes?
Asked Answered
P

7

20

Are there pythonic ways to maintain state (for purposes of optimisation, for example) without going fully object-oriented?

To illustrate my question better, here's an example of a pattern I use frequently in JavaScript:

var someFunc = (function () {
    var foo = some_expensive_initialization_operation();
    return someFunc (bar) {
        // do something with foo and bar
    }
}());

Externally this is just a function like any other, with no need to initialise objects or anything like that, but the closure allows computing values a single time that I then essentially use as constants.

An example of this in Python is when optimising regular expressions - it's useful to use re.compile and stored the compiled version for match and search operations.

The only ways I know of to do this in Python are by setting a variable in the module scope:

compiled_regex = compile_my_regex()

def try_match(m): # In reality I wouldn't wrap it as pointlessly as this
    return compiled_regex.match(m)

Or by creating a class:

class MatcherContainer(object):
    def __init__(self):
        self.compiled_regex = compile_my_regex()
    def try_match(self, m):
        self.compiled_regex.match(m)

my_matcher = MatcherContainer()

The former approach is ad-hoc and it's not very clear that the function and the variable declared above it are associated with each other. It also sits pollutes the module's namespace a bit, which I'm not too happy with.

The latter approach seems verbose and a bit heavy on the boilerplate.

The only other way I can think of to deal with this is to factor any functions like this out into separate files (modules) and just import the functions, so that everything's clean.

Any advice from more experienced Pythoners on how to deal with this? Or do you just not worry about it and get on with solving the problem?

Panne answered 8/8, 2012 at 14:14 Comment(6)
I personally see no issue with your class creation. Verbosity is good. "Explicit is better than implicit."Wafture
One way to make at least the use of that class less verbose would be to rename try_match to __call__, which lets you use it (after construction) exactly like a function. But, as in @glglgl's answer, your javascript code actually translates directly into Python.Kursk
is that call to someFunc correct in your first example? Or should it be function definition?Nabala
The call is correct - it executes the outer function, which returns the inner function, and that's what gets assigned to someFunc in the outer scope.Panne
@Cerales sorry i'm still trying to wrap my head around it. Aren't brackets in return someFunc (bar) {} spurious? also if someFunc(bar) is an actuall call, won't it cause stack overflow?Nabala
Nah. Code inside the outer function - such as the line var foo = some_expensive_initialization_operation(); is called when this block is first evaluated - note that the outer function is called with () immediately after its closing brace. This means that as the interpreter continues, what's been assigned in the outer namespace to someFunc is the inner someFunc function, which here takes a single argument. Since it contains a reference to foo, foo is not garbage collected - this is the purpose of a 'closure'.Panne
G
15

You can also accomplish this with default arguments:

def try_match(m, re_match=re.compile(r'sldkjlsdjf').match):
    return re_match(m)

since default arguments are only evaluated once, at module import time.

Or even simpler:

try_match = lambda m, re_match=re.compile(r'sldkjlsdjf').match: re_match(m)

Or simplest yet:

try_match = re.compile(r'sldkjlsdjf').match

This saves not only the re compile time (which is actually cached internally in the re module anyway), but also the lookup of the '.match' method. In a busy function or a tight loop, those '.' resolutions can add up.

Garald answered 8/8, 2012 at 14:27 Comment(1)
Thanks for the answer. I know about the module caching, but even in a script where I use a single regular expression in some tight loops, I got a significant performance increase by using re.compile rather than relying on the built-in caching.Panne
N
16

You can define closure in Python in the same way you define a closure in JavaScript.

def get_matcher():
    compiled_regex = compile_my_regex()

    def try_match(m)
        return compiled_regex.match(m)

    return try_match

However, in Python 2.x closures are read-only (you cannot re-assign to compiled_regex inside function call, for the example above). If the closure variable is a mutable data structure (e.g. list, dict, set), you can modify it inside your function call though.

def get_matcher():
    compiled_regex = compile_my_regex()
    match_cache = {}

    def try_match(m):
        if m not in match_cache:
           match_cache[m] = compiled_regex.match(m)

        return match_cache[m]

    return try_match

In Python 3.x , you can use the nonlocal keyword to re-assign to closure variable in function call. (PEP-3104)

Also see the following questions on closure in Python:

Natica answered 8/8, 2012 at 14:27 Comment(3)
Just to add: Python 3 introduces nonlocal which can be used to give explicit write access to closed-over variables. (PEP 3104)Miserable
I'm a bit confused by this one. Why wouldn't the entire body of get_matcher() be evaluated every time this runs? Or is the purpose to then do something like try_match = get_matcher()?Panne
Yes. The idea is to call get_matcher() only once, and use the function returned by get_matcher() to do the actual work.Natica
G
15

You can also accomplish this with default arguments:

def try_match(m, re_match=re.compile(r'sldkjlsdjf').match):
    return re_match(m)

since default arguments are only evaluated once, at module import time.

Or even simpler:

try_match = lambda m, re_match=re.compile(r'sldkjlsdjf').match: re_match(m)

Or simplest yet:

try_match = re.compile(r'sldkjlsdjf').match

This saves not only the re compile time (which is actually cached internally in the re module anyway), but also the lookup of the '.match' method. In a busy function or a tight loop, those '.' resolutions can add up.

Garald answered 8/8, 2012 at 14:27 Comment(1)
Thanks for the answer. I know about the module caching, but even in a script where I use a single regular expression in some tight loops, I got a significant performance increase by using re.compile rather than relying on the built-in caching.Panne
I
8

What about

def create_matcher(re):
    compiled_regex = compile_my_regex()
    def try_match(m):
        return compiled_regex.match(m)
    return try_match

matcher = create_matcher(r'(.*)-(.*)')
print matcher("1-2")

?

But classes are better and cleaner in most cases.

Impassion answered 8/8, 2012 at 14:18 Comment(0)
G
6

You can stash an attribute in any function. Since the function name is global, you can retrieve it in other functions. For example:

def memorize(t):
    memorize.value = t

def get():
    return memorize.value

memorize(5)
print get()

Output:

5

You can use it to store state in a single function:

def memory(t = None):
    if t:
        memory.value = t
    return memory.value

print memory(5)
print memory()
print memory()
print memory(7)
print memory()
print memory()

Output:

5
5
5
7
7
7

Granted its usefulness is limited. I've only used it on SO in this question.

Groomsman answered 9/8, 2012 at 2:22 Comment(0)
F
2

An often-used convention is to precede private module-level globals with an underscore to indicate they aren't part of the exported API of the module:

# mymodule.py

_MATCHER = compile_my_regex()

def try_match(m):
    return _MATCHER.match(m)

You shouldn't be discouraged from doing this - it's preferable to a hidden variable in a function closure.

Fraxinella answered 8/8, 2012 at 14:43 Comment(0)
A
0

You could use generator.send(); it's probably not appropriate for this particular case but is useful for maintaining state without classes. Calling '.send(x)' sets the value after yield is called. If 'next' is called instead possible_vals would be none. Related Send Question.

def try_match(regex = '', target = ''):
    cache = {}
    while True:
        if regex not in cache:
            cache[regex] = re.compile(regex).match
        possible_vals = (yield cache[regex](target))
        if possible_vals is not None:
            (regex, target) = possible_vals
        
m = try_match(r'(.*)-(.*)', '1-2')
print(next(m))
m.send((r'(.*)-(.*)', '3-4'))
print(next(m))

#Note that you have to call yield before you can send it more values
n = try_match()
n.send((r'(.*)-(.*)', '5-6'))
print(next(n))
Acuate answered 8/9, 2020 at 23:48 Comment(0)
G
0

I can think of two alternatives of functions with states. Here are toy examples:

  1. Using a "nonlocal"
def func_creator():
    state = 0

    def func():
        nonlocal state
        state += 1
        return state
    
    return func

func_with_state = func_creator()
  1. Taking advantage of list mutability
def func_with_state(state=[0]):
    state[0] += 1
    return state[0]
Guenna answered 27/2, 2024 at 11:24 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.