Summary
The method in question returns the special value None
, which is the unique instance of the NoneType
type. It updates the object as a side effect, and does not return that object. Since x.y()
returns None
, x = x.y()
causes x
to become None
, and x.y().z()
fails because None
does not have the specified z
method (in fact, it only has helper methods that should not be called directly).
This happens in many places throughout Python, and is a deliberate design decision. It allows the reader of code like x.y().z()
to assume correctly that the code does not have side effects; and it makes a clear visual distinction between code that updates a mutable object and code that replaces an immutable object.
For simple cases, instead of using x = x.y()
, just write x.y()
. Instead of trying to chain calls like x.y().z()
, make each separately: x.y()
and then x.z()
. However, it is often necessary (or a better idea) to make a modified copy of x
, rather than updating it in place. The correct approach to this will be context-specific and requires a more careful understanding.
Code like x = x.y()
could work in cases where the class of x
violates Python convention and does something like return self
after updating the object. However, it will generally still be better to just write x.y()
.
Code like x.y().z()
can be correct, in cases where the y
method works by computing a result instead of by updating x
. In these cases, z
is called on that computed result, not x
, so the result needs to support that method.
Especially if the goal is to make multiple changes to a list, it will often be better to use a list comprehension or similar tool in order to create a separate list with all the changes. However, that is beyond the scope of this Q&A.
Understanding the problem
First, some more concrete examples.
Using mutating methods on a list:
>>> mylist = []
>>> mylist.append(1).append(2) # try to append two values
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'append'
Using "external" algorithms on a list, for example shuffling it:
>>> import random
>>> mylist = [1, 2, 3]
>>> print(random.shuffle(mylist))
None
>>> print(mylist) # 6 different possible results, of course
[3, 1, 2]
Using methods like grid
, pack
or place
on Tkinter widgets (example copied from the other question and annotated):
from tkinter import *
root = Tk()
def grabText(event):
print(entryBox.get())
entryBox = Entry(root, width=60).grid(row=2, column=1, sticky=W)
# This button is supposed to print the contents of the Entry when clicked,
# but instead `entryBox` will be `None` when the button is clicked,
# so an exception is raised.
grabBtn = Button(root, text="Grab")
grabBtn.grid(row=8, column=1)
grabBtn.bind('<Button-1>', grabText)
root.mainloop()
With the Pandas third-party library, using inplace=True
on certain methods:
>>> df = pd.DataFrame({'name': ['Alice', 'Bob']}, employee_id=[2, 1])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __init__() got an unexpected keyword argument 'employee_id'
>>> df = pd.DataFrame({'name': ['Alice', 'Bob'], 'employee_id': [2, 1]})
>>> df
name employee_id
0 Alice 2
1 Bob 1
>>> print(df.set_index('employee_id', inplace=True))
None
>>> df
name
employee_id
2 Alice
1 Bob
In all of these cases, the problem is as described in the summary: the methods being called - append
on a list, grid
on a Tkinter Entry
, and set_index
on a Pandas DataFrame
with inplace=True
specified - work by updating the list
, Entry
and DataFrame
(respectively) in-place, not by computing and return
ing a result. They then return
the special value None
, not the original instance, and not a new instance of the same class. In Python, methods that modify the object like this are expected to return None
this way.
On the flip side, methods which return a new instance (in some cases, possibly either a new instance or the original instance) are expected not to modify the internal state of the instance on which they are called. Considering the Tkinter example again, the get
method of the Entry
returns the text contained within that text input box in the GUI, as a string; it does not modify the Entry
while doing so (at least, not in a way that can be observed from outside).
Design justification
This design choice is called Command-query separation, and it is considered an important idiom in Python that is respected by the standard library and by popular third-party libraries.
The idea is simple: a method should either return a result from a calculation (a "query" about what the object "contains" or "knows"), or update the internal state of the object (a "command" for the object to "do something") - but not both. A returned value is conceptually an answer to a question that may depend on the object's state. If the same question is asked repeatedly, logically the answer should stay the same; but if we allow the object's state to be updated, that might not happen.
Some languages, like C, C++, C# and Java, make a syntactic distinction between "queries" and "commands": methods and functions can have a void
return type, in which case they don't actually return a value (and calls to those methods and functions cannot be used in a larger expression). However, Python does not work that way; the best we can do is to return the special value None
, and expect the caller to handle it appropriately.
On the other hand, Python does make a syntactic distinction between assignments and expressions; in Python, assignments are statements instead, so they cannot be used in a larger expression:
>>> a = b = 1 # support for this is built in to the assignment syntax.
>>> a = (b = 1) # `b = 1` isn't an expression, so the result can't be assigned to `a`.
File "<stdin>", line 1
a = (b = 1)
^
SyntaxError: invalid syntax
Command-query separation with methods is analogous to this "assignment-expression separation". (In 3.8, a new "walrus" operator was added to allow expressions that perform an assignment as a side effect. This was very controversial at the time, because it is a loophole in something that was deliberately designed. However, there are good use cases for it, and the syntax is limited and explicit.)
In some other languages, such as JavaScript, the preference is not to respect this principle, in order to create "fluent" interfaces where many method calls are chained on the same object, and that object's state may be updated multiple times. This may be seen, for example, as an elegant way to construct an object in multiple steps. Methods implement this strategy by return
ing the current object after doing some work.
However, while it is okay in itself to return self
from a method in Python, Python code should generally not do this after updating the object state. The convention is to return None
instead (or simply just not return
explicitly), so that client code is aware that this method is a "command" and not a "query".
Special case: "pop" methods
The .pop
methods of the built-in list
, set
and dict
and bytearray
types are special (as is the .popitem
method of dict
s). These methods do not follow command-query separation - they implement both a command (remove an element from the container) and a query (indicate what was removed). This is done because the concept of "popping" is well established in computer science, and so a standard, pre-existing design is implemented.
Keep in mind that the return value is still not the original object, so these methods still do not allow chaining on the original object. Methods (and other expressions) can still be chained, but the results may not be as expected:
>>> number_names = {1: 'one', 2: 'two', 3: 'three'}
>>> number_names.pop(2)[1]
'w'
Here, the result is not 'one'
(the value for the 1
key in the dictionary that remains after popping the 2
key), but 'w'
(the element at index [1]
of the string 'two'
that was popped).
Workarounds
First, decide whether to make a (modified) copy of the object. The above examples all modify an existing object without creating a new one. Usually a simpler way is to create a new object that is similar to the original, but with a specific change made. Often it won't matter, but making a copy can range from necessary to unacceptable, depending on the overall task. For example, separate objects might be necessary when building a list of lists; but other designs might require sharing an object deliberately.
When the code would be correct either way, modifying a existing object is usually faster; but in many cases the difference will not be noticeable.
Syntactic workarounds when no copy is needed
Of course, with code like x = x.y()
, the simplest fix is simply don't assign; just write x.y()
instead. This already causes x
to change (that's the purpose of the y
method).
To fix the problem with chained method calls, like x.y().z()
, the simplest approach is to break the chain and make each call separately:
x.y() # x still means the same object, but it has been modified
x.z() # so now the z method can be called, and both changes apply
There is a workaround that allows for chaining. First, the workaround by itself:
>>> x = []
>>> y = x.append(1) or x
>>> y
[1]
The idea is simple, but tricky. The None
returned from the modifying method (here, list.append
) is Falsy, so the or
will evaluate to the right-hand side. That right-hand side is the same object, so now y
names the same object that x
does, and "sees" the change made by x.append(1)
.
Thus, to make chained calls, simply apply the method to the expression created with or
. This requires parentheses, of course:
>>> x = []
>>> (x.append(1) or x).append(2)
>>> x
[1, 2]
This approach quickly gets unwieldy, however:
>>> x = []
>>> (((x.append(3) or x).extend([1, 'bad', 2]) or x).remove('bad') or x).sort()
>>> x
[1, 2, 3]
Compare to the straightforward approach, not trying to chain:
>>> x = []
>>> x.append(3)
>>> x.extend([1, 'bad', 2])
>>> x.remove('bad')
>>> x.sort()
>>> x
[1, 2, 3]
Explicit copying first
When a separate copy is acceptable (or necessary), first check the documentation to see if there is a preferred way to make copies. In many cases, the desired functionality is already available from a separate method that creates a modified copy (see the next section). In other cases, the class implements its own method for copying, for technical reasons. Note that most ways to copy an object will give a shallow copy, not a deep copy. Beware of this in cases where the difference matters.
There are many ways to copy a list - unavoidably, because lists are so flexible. The built-in .copy
method is in principle the "right way" since Python 3.3 (when it was added) to make a shallow copy of a list: it explicitly says what the code is doing, and gets updated to use the fastest-known techniques for the copy.
If all else fails, try using the standard library copy
module to clone objects with copy.copy
(shallow copies) or copy.deepcopy
(deep copies). However, even this is not fully general.
Context-specific ways to make a modified copy
Here is a table listing replacement code for various in-place methods on built-in objects, to get a modified copy instead. In each case, the replacement code is an expression evaluating to the modified copy; no assignment occurs.
For methods provided by a library, again, please check the documentation first. With Pandas, for example, getting a modified copy is often as simple as just not using inplace=True
.
Example |
Replacement |
List methods (x and y are list s) |
|
x.clear() |
[] |
x.append(1) |
x + [1] |
x.extend(y) |
x + y |
x.remove(1) (remove first match only) |
x[:x.index(1)] + x[x.index(1):] |
while 1 in x: x.remove(1) (remove all matches) |
[y for y in x if y != 1] |
x.insert(y, z) (insert at an arbitrary position) |
x[:y] + [z] + x[y:] (but copying and modifying is faster) |
x.sort() |
sorted(x) |
x.reverse() (need to use the object later) |
list(reversed(x)) or x[::-1] |
x.reverse() and then for y in x: (only need the object to set up a loop) |
for y in reversed(x): (saves memory compared to the above) |
random.shuffle(x) |
random.sample(x, len(x)) or
sorted(x, key=lambda _: random.random()) |
Set methods (x and y are set s) |
|
x.clear() |
set() (not {} , which makes a dict) |
x.update(y) (implements |= ) |
x.union(y) or x | y |
x.add(1) |
x.union({1}) or x | {1} |
x.difference_update(y) (implements -= ) |
x - y or x.difference(y) |
x.discard(1) |
x - {1} or x.difference({1}) |
x.remove(1) |
as with discard , but explicitly raise an exception first if the set does not contain the element |
x.intersection_update(y) (implements &= ) |
x.intersection(y) or x & y |
x.symmetric_difference_update(y) (implements ^= ) |
x.symmetric_difference(y) or x ^ y |
Dict methods (x and y are dict s) |
|
x.clear() |
{} |
x.update(y) |
Many alternatives, depending on Python version; see https://stackoverflow.com/questions/38987| |
Note that although bytearray
provides many methods that might sound like "commands" rather than "queries", the ones that are also present in bytes
generally are in fact "queries" that will return a new bytearray
instead. For the ones that actually do modify the original bytearray
, try the approaches shown for list
s above.
None
on error or "not found". All are supported by python and you need to know something about the modules in use. – Leadx.y()
returnsNone
to signal "not found" (like in the other canonical I wrote just yesterday), which breaksx.y().z()
chaining (reassigning tox
will still generally be nonsensical). That isn't intended to be covered here, but I see the issue that this could become a false positive in search engines (it's primarily intended for closing duplicates). – Lonesome