After `x = x.y()`, why did `x` become `None` instead of being modified (possibly causing "AttributeError: 'NoneType' object has no attribute")?
Asked Answered
L

1

7

If your question was closed as a duplicate of this, it is because you have some code of the general form

x = X()
# later...
x = x.y()
# or:
x.y().z()

where X is some type that provides y and z methods intended to mutate (modify) the object (instance of the X type). This can apply to:

  • mutable built-in types, such as list, dict, set and bytearray
  • classes provided by the standard library (especially Tkinter widgets) or by a third-party library.

Code of this form is commonly, but not always wrong. The telltale signs of a problem are:

  • With x.y().z(), an exception is raised like AttributeError: 'NoneType' object has no attribute 'z'.

  • With x = x.y(), x becomes None, instead of being the modified object. This might be discovered by later wrong results, or by an exception like the above (when x.z() is tried later).

There are a huge number of existing questions on Stack Overflow about this issue, all of which are really the same question. There are even multiple previous attempts at canonicals covering the same question in a specific context. However, the context is not needed to understand the problem, so here is an attempt to answer generally:

What is wrong with the code? Why do the methods behave this way, and how can we work around that?


Also note that analogous problems occur when trying to use a lambda (or a list comprehension) for side effects.

The same apparent problem can be caused by methods that return None for other reasons - for example, BeautifulSoup uses None return values to indicate that a tag was not found in the HTML. However, once the current problem - of expecting a method to update an object and also return the same object - has been identified, it is the same problem in all contexts.

Please do not use use this question to close other questions that are about using .append in a loop to append to a list repeatedly. Simply understanding what went wrong with the .append usage will not be very helpful in these cases, and people asking those questions should also see other techniques for building lists. Please use How can I collect the results of a repeated calculation in a list, dictionary etc. (or make a copy of a list with each element modified)? instead.

More specific versions of the Q&A:

Lonesome answered 27/3, 2023 at 0:18 Comment(6)
Context is needed to understand the problem. One design philosophy prohibits methods that modify an object from also returning the object. Another encourages method chaining. Yet another returns None on error or "not found". All are supported by python and you need to know something about the modules in use.Lead
@Lead "One design philosophy prohibits methods that modify an object from also returning the object." - yes, and that is the standard for Python, respected by the standard library and most major third-party libraries, and documented as expressly How It's Supposed to Work, per "word of God" (quotes from GvR himself). Of course it is possible to write code that doesn't use this idiom; that's the point of the boldface "commonly, but not always" in the question text. Context is not needed to understand the problem because in other contexts, there is not a problem.Lonesome
@Lead Ah - I assume you are primarily concerned with cases where x.y() returns None to signal "not found" (like in the other canonical I wrote just yesterday), which breaks x.y().z() chaining (reassigning to x will still generally be nonsensical). That isn't intended to be covered here, but I see the issue that this could become a false positive in search engines (it's primarily intended for closing duplicates).Lonesome
Which is to say: context is needed to identify the problem, but not to understand it.Lonesome
This doesn't really seem to add much beyond "slightly greater generalization" over the existing question, already made as a canonical dupe, Why do these list methods (append, sort, extend, remove, clear, reverse) return None rather than the resulting list?. I'd vote to close this question as a dupe of that old one, but I've got dupehammer powers and don't feel comfortable being the sole vote for it.Warble
@Warble the goal is specifically to avoid sending questions about rarer, non-list-related cases there; and to have one, common, canonical explanation when the same issue comes up with Pandas, Tkinter etc.Lonesome
L
4

Summary

The method in question returns the special value None, which is the unique instance of the NoneType type. It updates the object as a side effect, and does not return that object. Since x.y() returns None, x = x.y() causes x to become None, and x.y().z() fails because None does not have the specified z method (in fact, it only has helper methods that should not be called directly).

This happens in many places throughout Python, and is a deliberate design decision. It allows the reader of code like x.y().z() to assume correctly that the code does not have side effects; and it makes a clear visual distinction between code that updates a mutable object and code that replaces an immutable object.

For simple cases, instead of using x = x.y(), just write x.y(). Instead of trying to chain calls like x.y().z(), make each separately: x.y() and then x.z(). However, it is often necessary (or a better idea) to make a modified copy of x, rather than updating it in place. The correct approach to this will be context-specific and requires a more careful understanding.

Code like x = x.y() could work in cases where the class of x violates Python convention and does something like return self after updating the object. However, it will generally still be better to just write x.y().

Code like x.y().z() can be correct, in cases where the y method works by computing a result instead of by updating x. In these cases, z is called on that computed result, not x, so the result needs to support that method.

Especially if the goal is to make multiple changes to a list, it will often be better to use a list comprehension or similar tool in order to create a separate list with all the changes. However, that is beyond the scope of this Q&A.

Understanding the problem

First, some more concrete examples.

  • Using mutating methods on a list:

    >>> mylist = []
    >>> mylist.append(1).append(2) # try to append two values
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'NoneType' object has no attribute 'append'
    
  • Using "external" algorithms on a list, for example shuffling it:

    >>> import random
    >>> mylist = [1, 2, 3]
    >>> print(random.shuffle(mylist))
    None
    >>> print(mylist) # 6 different possible results, of course
    [3, 1, 2]
    
  • Using methods like grid, pack or place on Tkinter widgets (example copied from the other question and annotated):

    from tkinter import *
    
    root = Tk()
    
    def grabText(event):
        print(entryBox.get())    
    
    entryBox = Entry(root, width=60).grid(row=2, column=1, sticky=W)
    
    # This button is supposed to print the contents of the Entry when clicked,
    # but instead `entryBox` will be `None` when the button is clicked,
    # so an exception is raised.
    grabBtn = Button(root, text="Grab")
    grabBtn.grid(row=8, column=1)
    grabBtn.bind('<Button-1>', grabText)
    
    root.mainloop()
    
  • With the Pandas third-party library, using inplace=True on certain methods:

    >>> df = pd.DataFrame({'name': ['Alice', 'Bob']}, employee_id=[2, 1])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: __init__() got an unexpected keyword argument 'employee_id'
    >>> df = pd.DataFrame({'name': ['Alice', 'Bob'], 'employee_id': [2, 1]})
    >>> df
        name  employee_id
    0  Alice            2
    1    Bob            1
    >>> print(df.set_index('employee_id', inplace=True))
    None
    >>> df
                  name
    employee_id       
    2            Alice
    1              Bob
    

In all of these cases, the problem is as described in the summary: the methods being called - append on a list, grid on a Tkinter Entry, and set_index on a Pandas DataFrame with inplace=True specified - work by updating the list, Entry and DataFrame (respectively) in-place, not by computing and returning a result. They then return the special value None, not the original instance, and not a new instance of the same class. In Python, methods that modify the object like this are expected to return None this way.

On the flip side, methods which return a new instance (in some cases, possibly either a new instance or the original instance) are expected not to modify the internal state of the instance on which they are called. Considering the Tkinter example again, the get method of the Entry returns the text contained within that text input box in the GUI, as a string; it does not modify the Entry while doing so (at least, not in a way that can be observed from outside).

Design justification

This design choice is called Command-query separation, and it is considered an important idiom in Python that is respected by the standard library and by popular third-party libraries.

The idea is simple: a method should either return a result from a calculation (a "query" about what the object "contains" or "knows"), or update the internal state of the object (a "command" for the object to "do something") - but not both. A returned value is conceptually an answer to a question that may depend on the object's state. If the same question is asked repeatedly, logically the answer should stay the same; but if we allow the object's state to be updated, that might not happen.

Some languages, like C, C++, C# and Java, make a syntactic distinction between "queries" and "commands": methods and functions can have a void return type, in which case they don't actually return a value (and calls to those methods and functions cannot be used in a larger expression). However, Python does not work that way; the best we can do is to return the special value None, and expect the caller to handle it appropriately.

On the other hand, Python does make a syntactic distinction between assignments and expressions; in Python, assignments are statements instead, so they cannot be used in a larger expression:

>>> a = b = 1 # support for this is built in to the assignment syntax.
>>> a = (b = 1) # `b = 1` isn't an expression, so the result can't be assigned to `a`.
  File "<stdin>", line 1
    a = (b = 1)
           ^
SyntaxError: invalid syntax

Command-query separation with methods is analogous to this "assignment-expression separation". (In 3.8, a new "walrus" operator was added to allow expressions that perform an assignment as a side effect. This was very controversial at the time, because it is a loophole in something that was deliberately designed. However, there are good use cases for it, and the syntax is limited and explicit.)

In some other languages, such as JavaScript, the preference is not to respect this principle, in order to create "fluent" interfaces where many method calls are chained on the same object, and that object's state may be updated multiple times. This may be seen, for example, as an elegant way to construct an object in multiple steps. Methods implement this strategy by returning the current object after doing some work.

However, while it is okay in itself to return self from a method in Python, Python code should generally not do this after updating the object state. The convention is to return None instead (or simply just not return explicitly), so that client code is aware that this method is a "command" and not a "query".

Special case: "pop" methods

The .pop methods of the built-in list, set and dict and bytearray types are special (as is the .popitem method of dicts). These methods do not follow command-query separation - they implement both a command (remove an element from the container) and a query (indicate what was removed). This is done because the concept of "popping" is well established in computer science, and so a standard, pre-existing design is implemented.

Keep in mind that the return value is still not the original object, so these methods still do not allow chaining on the original object. Methods (and other expressions) can still be chained, but the results may not be as expected:

>>> number_names = {1: 'one', 2: 'two', 3: 'three'}
>>> number_names.pop(2)[1]
'w'

Here, the result is not 'one' (the value for the 1 key in the dictionary that remains after popping the 2 key), but 'w' (the element at index [1] of the string 'two' that was popped).

Workarounds

First, decide whether to make a (modified) copy of the object. The above examples all modify an existing object without creating a new one. Usually a simpler way is to create a new object that is similar to the original, but with a specific change made. Often it won't matter, but making a copy can range from necessary to unacceptable, depending on the overall task. For example, separate objects might be necessary when building a list of lists; but other designs might require sharing an object deliberately.

When the code would be correct either way, modifying a existing object is usually faster; but in many cases the difference will not be noticeable.

Syntactic workarounds when no copy is needed

Of course, with code like x = x.y(), the simplest fix is simply don't assign; just write x.y() instead. This already causes x to change (that's the purpose of the y method).

To fix the problem with chained method calls, like x.y().z(), the simplest approach is to break the chain and make each call separately:

x.y() # x still means the same object, but it has been modified
x.z() # so now the z method can be called, and both changes apply

There is a workaround that allows for chaining. First, the workaround by itself:

>>> x = []
>>> y = x.append(1) or x
>>> y
[1]

The idea is simple, but tricky. The None returned from the modifying method (here, list.append) is Falsy, so the or will evaluate to the right-hand side. That right-hand side is the same object, so now y names the same object that x does, and "sees" the change made by x.append(1). Thus, to make chained calls, simply apply the method to the expression created with or. This requires parentheses, of course:

>>> x = []
>>> (x.append(1) or x).append(2)
>>> x
[1, 2]

This approach quickly gets unwieldy, however:

>>> x = []
>>> (((x.append(3) or x).extend([1, 'bad', 2]) or x).remove('bad') or x).sort()
>>> x
[1, 2, 3]

Compare to the straightforward approach, not trying to chain:

>>> x = []
>>> x.append(3)
>>> x.extend([1, 'bad', 2])
>>> x.remove('bad')
>>> x.sort()
>>> x
[1, 2, 3]

Explicit copying first

When a separate copy is acceptable (or necessary), first check the documentation to see if there is a preferred way to make copies. In many cases, the desired functionality is already available from a separate method that creates a modified copy (see the next section). In other cases, the class implements its own method for copying, for technical reasons. Note that most ways to copy an object will give a shallow copy, not a deep copy. Beware of this in cases where the difference matters.

There are many ways to copy a list - unavoidably, because lists are so flexible. The built-in .copy method is in principle the "right way" since Python 3.3 (when it was added) to make a shallow copy of a list: it explicitly says what the code is doing, and gets updated to use the fastest-known techniques for the copy.

If all else fails, try using the standard library copy module to clone objects with copy.copy (shallow copies) or copy.deepcopy (deep copies). However, even this is not fully general.

Context-specific ways to make a modified copy

Here is a table listing replacement code for various in-place methods on built-in objects, to get a modified copy instead. In each case, the replacement code is an expression evaluating to the modified copy; no assignment occurs.

For methods provided by a library, again, please check the documentation first. With Pandas, for example, getting a modified copy is often as simple as just not using inplace=True.

Example Replacement
List methods (x and y are lists)
x.clear() []
x.append(1) x + [1]
x.extend(y) x + y
x.remove(1)
(remove first match only)
x[:x.index(1)] + x[x.index(1):]
while 1 in x: x.remove(1)
(remove all matches)
[y for y in x if y != 1]
x.insert(y, z)
(insert at an arbitrary position)
x[:y] + [z] + x[y:]
(but copying and modifying is faster)
x.sort() sorted(x)
x.reverse()
(need to use the object later)
list(reversed(x)) or x[::-1]
x.reverse() and then for y in x:
(only need the object to set up a loop)
for y in reversed(x):
(saves memory compared to the above)
random.shuffle(x) random.sample(x, len(x)) or
sorted(x, key=lambda _: random.random())
Set methods (x and y are sets)
x.clear() set() (not {}, which makes a dict)
x.update(y)
(implements |=)
x.union(y) or x | y
x.add(1) x.union({1}) or x | {1}
x.difference_update(y)
(implements -=)
x - y or x.difference(y)
x.discard(1) x - {1} or x.difference({1})
x.remove(1) as with discard, but explicitly raise an exception first if the set does not contain the element
x.intersection_update(y)
(implements &=)
x.intersection(y) or x & y
x.symmetric_difference_update(y) (implements ^=) x.symmetric_difference(y) or x ^ y
Dict methods (x and y are dicts)
x.clear() {}
x.update(y) Many alternatives, depending on Python version; see https://stackoverflow.com/questions/38987|

Note that although bytearray provides many methods that might sound like "commands" rather than "queries", the ones that are also present in bytes generally are in fact "queries" that will return a new bytearray instead. For the ones that actually do modify the original bytearray, try the approaches shown for lists above.

Lonesome answered 27/3, 2023 at 0:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.