Why are default arguments evaluated at definition time? [duplicate]
Asked Answered
Z

9

46

I had a very difficult time with understanding the root cause of a problem in an algorithm. Then, by simplifying the functions step by step I found out that evaluation of default arguments in Python doesn't behave as I expected.

The code is as follows:

class Node(object):
    def __init__(self, children = []):
        self.children = children

The problem is that every instance of Node class shares the same children attribute, if the attribute is not given explicitly, such as:

>>> n0 = Node()
>>> n1 = Node()
>>> id(n1.children)
Out[0]: 25000176
>>> id(n0.children)
Out[0]: 25000176

I don't understand the logic of this design decision? Why did Python designers decide that default arguments are to be evaluated at definition time? This seems very counter-intuitive to me.

Zig answered 30/10, 2009 at 17:16 Comment(1)
My guess would be performance. Imagine reevaluating every time a function is called if it's called 15 million times a day.Numerical
M
45

The alternative would be quite heavyweight -- storing "default argument values" in the function object as "thunks" of code to be executed over and over again every time the function is called without a specified value for that argument -- and would make it much harder to get early binding (binding at def time), which is often what you want. For example, in Python as it exists:

def ack(m, n, _memo={}):
  key = m, n
  if key not in _memo:
    if m==0: v = n + 1
    elif n==0: v = ack(m-1, 1)
    else: v = ack(m-1, ack(m, n-1))
    _memo[key] = v
  return _memo[key]

...writing a memoized function like the above is quite an elementary task. Similarly:

for i in range(len(buttons)):
  buttons[i].onclick(lambda i=i: say('button %s', i))

...the simple i=i, relying on the early-binding (definition time) of default arg values, is a trivially simple way to get early binding. So, the current rule is simple, straightforward, and lets you do all you want in a way that's extremely easy to explain and understand: if you want late binding of an expression's value, evaluate that expression in the function body; if you want early binding, evaluate it as the default value of an arg.

The alternative, forcing late binding for both situation, would not offer this flexibility, and would force you to go through hoops (such as wrapping your function into a closure factory) every time you needed early binding, as in the above examples -- yet more heavy-weight boilerplate forced on the programmer by this hypothetical design decision (beyond the "invisible" ones of generating and repeatedly evaluating thunks all over the place).

In other words, "There should be one, and preferably only one, obvious way to do it [1]": when you want late binding, there's already a perfectly obvious way to achieve it (since all of the function's code is only executed at call time, obviously everything evaluated there is late-bound); having default-arg evaluation produce early binding gives you an obvious way to achieve early binding as well (a plus!-) rather than giving TWO obvious ways to get late binding and no obvious way to get early binding (a minus!-).

[1]: "Although that way may not be obvious at first unless you're Dutch."

Manual answered 30/10, 2009 at 17:37 Comment(3)
excellent answer, +1 from me. A very minor typo: it should be return _memo[key] with a leading underscore.Mallis
@Francesco, tx for pointing out the typo (and I imagine tx @novelocrat for so promptly fixing it!-).Manual
Would the overhead still be prohibitive in case of an deepcopy instead of delayed evaluation?Cracknel
S
11

The issue is this.

It's too expensive to evaluate a function as an initializer every time the function is called.

  • 0 is a simple literal. Evaluate it once, use it forever.

  • int is a function (like list) that would have to be evaluated each time it's required as an initializer.

The construct [] is literal, like 0, that means "this exact object".

The problem is that some people hope that it to means list as in "evaluate this function for me, please, to get the object that is the initializer".

It would be a crushing burden to add the necessary if statement to do this evaluation all the time. It's better to take all arguments as literals and not do any additional function evaluation as part of trying to do a function evaluation.

Also, more fundamentally, it's technically impossible to implement argument defaults as function evaluations.

Consider, for a moment the recursive horror of this kind of circularity. Let's say that instead of default values being literals, we allow them to be functions which are evaluated each time a parameter's default values are required.

[This would parallel the way collections.defaultdict works.]

def aFunc( a=another_func ):
    return a*2

def another_func( b=aFunc ):
    return b*3

What is the value of another_func()? To get the default for b, it must evaluate aFunc, which requires an eval of another_func. Oops.

Scathing answered 30/10, 2009 at 17:36 Comment(1)
I get the "it would be expensive" part, but the "it's impossible" part I don't get it. It can't be impossible when there are other interpreted dynamic languages that do itCoterie
M
9

Of course in your situation it is difficult to understand. But you must see, that evaluating default args every time would lay a heavy runtime burden on the system.

Also you should know, that in case of container types this problem may occur -- but you could circumvent it by making the thing explicit:

def __init__(self, children = None):
    if children is None:
       children = []
    self.children = children
Mesmerize answered 30/10, 2009 at 17:26 Comment(3)
you can also shorten it to self.children = children or [] instead of having the if statement.Jijib
What if I call it with (children=None). It will then incorrectly create children = []. In order to fix this one would need to use a sentinel value.Cracknel
In this case I silently assumed that None is an appropriate sentinel value. Of course, if None could be a valid value (in the case of children (very likely a list of things) unlikely), a different sentinel value must be used. If no standard value exists, use a specially created object for this.Mesmerize
F
8

I thought this was counterintuitive too, until I learned how Python implements default arguments.

A function's an object. At load time, Python creates the function object, evaluates the defaults in the def statement, puts them into a tuple, and adds that tuple as an attribute of the function named func_defaults. Then, when a function is called, if the call doesn't provide a value, Python grabs the default value out of func_defaults.

For instance:

>>> class C():
        pass

>>> def f(x=C()):
        pass

>>> f.func_defaults
(<__main__.C instance at 0x0298D4B8>,)

So all calls to f that don't provide an argument will use the same instance of C, because that's the default value.

As far as why Python does it this way: well, that tuple could contain functions that would get called every time a default argument value was needed. Apart from the immediately obvious problem of performance, you start getting into a universe of special cases, like storing literal values instead of functions for non-mutable types to avoid unnecessary function calls. And of course there are performance implications galore.

The actual behavior is really simple. And there's a trivial workaround, in the case where you want a default value to be produced by a function call at runtime:

def f(x = None):
   if x == None:
      x = g()
Flight answered 30/10, 2009 at 23:10 Comment(0)
P
7

The workaround for this, discussed here (and very solid), is:

class Node(object):
    def __init__(self, children = None):
        self.children = [] if children is None else children

As for why look for an answer from von Löwis, but it's likely because the function definition makes a code object due to the architecture of Python, and there might not be a facility for working with reference types like this in default arguments.

Partly answered 30/10, 2009 at 17:22 Comment(6)
Hi Jed, there might some (rare) problem when inputs other than [] can occur that evaluate to False. Then a legitime input might be transformed to []. Of course this can not happen as long as chilren must be a list.Mesmerize
... of forgot: More general would be "if children is None ..."Mesmerize
The "if children is None: children = []" (followed by "self.children = children" here) is equivalent (almost---degenerate values would be different) and much more readable.Hydrophilic
@Juergen: I've edited the answer. @R. Pate: Readability is relative, and I think my answer is quite readable.Partly
@R. Pate: As Jed put it, Readablility is relative. Of course, when not used to it, anything will feal unreadable to somebody.Mesmerize
It could also be written as self.children = children or [] assuming you only want lists as 'children' so that False is not a valid value.Jijib
E
6

This comes from python's emphasis on syntax and execution simplicity. a def statement occurs at a certain point during execution. When the python interpreter reaches that point, it evaluates the code in that line, and then creates a code object from the body of the function, which will be run later, when you call the function.

It's a simple split between function declaration and function body. The declaration is executed when it is reached in the code. The body is executed at call time. Note that the declaration is executed every time it is reached, so you can create multiple functions by looping.

funcs = []
for x in xrange(5):
    def foo(x=x, lst=[]):
        lst.append(x)
        return lst
    funcs.append(foo)
for func in funcs:
    print "1: ", func()
    print "2: ", func()

Five separate functions have been created, with a separate list created each time the function declaration was executed. On each loop through funcs, the same function is executed twice on each pass through, using the same list each time. This gives the results:

1:  [0]
2:  [0, 0]
1:  [1]
2:  [1, 1]
1:  [2]
2:  [2, 2]
1:  [3]
2:  [3, 3]
1:  [4]
2:  [4, 4]

Others have given you the workaround, of using param=None, and assigning a list in the body if the value is None, which is fully idiomatic python. It's a little ugly, but the simplicity is powerful, and the workaround is not too painful.

Edited to add: For more discussion on this, see effbot's article here: http://effbot.org/zone/default-values.htm, and the language reference, here: http://docs.python.org/reference/compound_stmts.html#function

Elisabethelisabethville answered 30/10, 2009 at 17:35 Comment(1)
Effbot article now only available (afaik) in web.archive.org/web/20201112004749/https://www.effbot.org/zone/…Jules
P
3

I'll provide a dissenting opinion, by addessing the main arguments in the other posts.

Evaluating default arguments when the function is executed would be bad for performance.

I find this hard to believe. If default argument assignments like foo='some_string' really add an unacceptable amount of overhead, I'm sure it would be possible to identify assignments to immutable literals and precompute them.

If you want a default assignment with a mutable object like foo = [], just use foo = None, followed by foo = foo or [] in the function body.

While this may be unproblematic in individual instances, as a design pattern it's not very elegant. It adds boilerplate code and obscures default argument values. Patterns like foo = foo or ... don't work if foo can be an object like a numpy array with undefined truth value. And in situations where None is a meaningful argument value that may be passed intentionally, it can't be used as a sentinel and this workaround becomes really ugly.

The current behaviour is useful for mutable default objects that should be shared accross function calls.

I would be happy to see evidence to the contrary, but in my experience this use case is much less frequent than mutable objects that should be created anew every time the function is called. To me it also seems like a more advanced use case, whereas accidental default assignments with empty containers are a common gotcha for new Python programmers. Therefore, the principle of least astonishment suggests default argument values should be evaluated when the function is executed.

In addition, it seems to me that there exists an easy workaround for mutable objects that should be shared across function calls: initialise them outside the function.

So I would argue that this was a bad design decision. My guess is that it was chosen because its implementation is actually simpler and because it has a valid (albeit limited) use case. Unfortunately, I don't think this will ever change, since the core Python developers want to avoid a repeat of the amount of backwards incompatibility that Python 3 introduced.

Ploch answered 18/7, 2019 at 10:22 Comment(0)
D
0

Because if they had, then someone would post a question asking why it wasn't the other way around :-p

Suppose now that they had. How would you implement the current behaviour if needed? It's easy to create new objects inside a function, but you cannot "uncreate" them (you can delete them, but it's not the same).

Dg answered 30/10, 2009 at 23:48 Comment(0)
T
-1

Python function definitions are just code, like all the other code; they're not "magical" in the way that some languages are. For example, in Java you could refer "now" to something defined "later":

public static void foo() { bar(); }
public static void main(String[] args) { foo(); }
public static void bar() {}

but in Python

def foo(): bar()
foo()   # boom! "bar" has no binding yet
def bar(): pass
foo()   # ok

So, the default argument is evaluated at the moment that that line of code is evaluated!

Teri answered 30/10, 2009 at 17:21 Comment(1)
Bad analogy. The pythonic equivalent to your java sample is inserting if __name__ == '__main__': main() to the end of the fileGottuard

© 2022 - 2024 — McMap. All rights reserved.