Is extending a Python list (e.g. l += [1]) guaranteed to be thread-safe?
Asked Answered
L

2

24

If I have an integer i, it is not safe to do i += 1 on multiple threads:

>>> i = 0
>>> def increment_i():
...     global i
...     for j in range(1000): i += 1
...
>>> threads = [threading.Thread(target=increment_i) for j in range(10)]
>>> for thread in threads: thread.start()
...
>>> for thread in threads: thread.join()
...
>>> i
4858  # Not 10000

However, if I have a list l, it does seem safe to do l += [1] on multiple threads:

>>> l = []
>>> def extend_l():
...     global l
...     for j in range(1000): l += [1]
...
>>> threads = [threading.Thread(target=extend_l) for j in range(10)]
>>> for thread in threads: thread.start()
...
>>> for thread in threads: thread.join()
...
>>> len(l)
10000

Is l += [1] guaranteed to be thread-safe? If so, does this apply to all Python implementations or just CPython?

Edit: It seems that l += [1] is thread-safe but l = l + [1] is not...

>>> l = []
>>> def extend_l():
...     global l
...     for j in range(1000): l = l + [1]
...
>>> threads = [threading.Thread(target=extend_l) for j in range(10)]
>>> for thread in threads: thread.start()
...
>>> for thread in threads: thread.join()
...
>>> len(l)
3305  # Not 10000
Lethbridge answered 8/7, 2016 at 12:4 Comment(5)
It is really surprising for me- I wouldn't expect that to happen as so. I hope someone provides a clear explanation of it.Alon
Although I upvoted this, I think the statement "Which operations in Python are guaranteed to be thread-safe and which are not?" condemns the question for broad closure. Could you rephrase it?Nuzzi
Awaiting some constraints added to the question, I found the effBot again: What kinds of global value mutation are thread-safe? an interesting read. I suggest some rewording into: "What kinds of global value mutation are thread safe" to be a nice jeopardy ;-) With respect to the list sample: The list is thread safe in its operations, but the data itself is not "secured" by the container. So any access into the list changing element content will suffer as the integer += 1.Heliotropin
@MartijnPieters - I assume you have closed this question because of the "Which operations in Python..." statement. I have now removed this generalization - would you be willing to reopen the question?Lethbridge
@user200783: done; please try to keep your questions specific.Turnbow
J
23

There isn't a happy ;-) answer to this. There's nothing guaranteed about any of it, which you can confirm simply by noting that the Python reference manual makes no guarantees about atomicity.

In CPython it's a matter of pragmatics. As a snipped part of effbot's article says,

In theory, this means an exact accounting requires an exact understanding of the PVM [Python Virtual Machine] bytecode implementation.

And that's the truth. A CPython expert knows L += [x] is atomic because they know all of the following:

  • += compiles to an INPLACE_ADD bytecode.
  • The implementation of INPLACE_ADD for list objects is written entirely in C (no Python code is on the execution path, so the GIL can't be released between bytecodes).
  • In listobject.c, the implementation of INPLACE_ADD is function list_inplace_concat(), and nothing during its execution needs to execute any user Python code either (if it did, the GIL may again be released).

That may all sound incredibly difficult to keep straight, but for someone with effbot's knowledge of CPython's internals (at the time he wrote that article), it really isn't. In fact, given that depth of knowledge, it's all kind of obvious ;-)

So as a matter of pragmatics, CPython experts have always freely relied on that "operations that 'look atomic' should really be atomic", and that also guided some language decisions. For example, an operation missing from effbot's list (added to the language after he wrote that article):

x = D.pop(y) # or ...
x = D.pop(y, default)

One argument (at the time) in favor of adding dict.pop() was precisely that the obvious C implementation would be atomic, whereas the in-use (at the time) alternative:

x = D[y]
del D[y]

was not atomic (the retrieval and the deletion are done via distinct bytecodes, so threads can switch between them).

But the docs never said .pop() was atomic, and never will. This is a "consenting adults" kind of thing: if you're expert enough to exploit this knowingly, you don't need hand-holding. If you're not expert enough, then the last sentence of effbot's article applies:

When in doubt, use a mutex!

As a matter of pragmatic necessity, core developers will never break the atomicity of effbot's examples (or of D.pop() or D.setdefault()) in CPython. Other implementations are under no obligation at all to mimic these pragmatic choices, though. Indeed, since atomicity in these cases relies on CPython's specific form of bytecode combined with CPython's use of a global interpreter lock that can only be released between bytecodes, it could be a real pain for other implementations to mimic them.

And you never know: some future version of CPython may remove the GIL too! I doubt it, but it's theoretically possible. But if that happens, I bet a parallel version retaining the GIL will be maintained too, because a whole lot of code (especially extension modules written in C) relies on the GIL for thread safety too.

Worth repeating:

When in doubt, use a mutex!

Jacquenette answered 12/7, 2016 at 5:42 Comment(7)
Thank you for this comprehensive answer. The summary seems to be "either use a mutex, or rely on undocumented implementation details of CPython". Presumably there is a lot of Python code which does the latter, based on the "operations that 'look atomic' should really be atomic" guideline. As you say, this means that CPython can never break the atomicity of certain operations. Other implementations could (and may find it easier to) do so, at the expense of breaking this code.Lethbridge
However, it seems that alternative Python implementations are trying to match the atomicity guidelines of CPython: I have reproduced the results in the question on PyPy, IronPython and Jython. I have a lot of respect for the developers of these runtimes: being highly compatible with CPython in order to run as much Python code as possible, even code which relies on undocumented implementation details, must be a huge amount of work.Lethbridge
Alas, there's no way to know without becoming an expert on each implementation: testing alone can never prove the absence of race-related bad behaviors. For example, there have been subtle threading bugs in CPython's implementation that went undiscovered for years, until some unlikely combination of HW, OS and workload just happened to provoke races that were always possible but were just never seen before. Uncommon, yes, but "know" is a strong word ;-)Jacquenette
Just a note... you can append lists in Python just using +Lusatia
@Matt, yes, but as the OP noted near the end of their question, L1 = L1 + L2 is not "thread safe"; L1 += L2 is.Jacquenette
Thanks again for this answer - I have now accepted it and awarded you the bounty on this question. Just a quick follow-up: you say that L += [x] is atomic because INPLACE_ADD for lists does not execute any Python code, either directly or via a callback into user code. Are there any similar operations (that "look atomic") which might execute Python code? In particular, which might execute user Python code?Lethbridge
Sorry for the delay! This just got too deep in my stack. The question may be "quick", but any real answer needs to be excruciatingly detailed, because there is no "general principle" at work: everything depends on exactly how operations are implemented in the specific version of Python you're using. In CPython, many operations involving an instance of a user-defined class can execute arbitrary Python code, including dict access (__hash__) and comparisons (__cmp__, __eq__, ...). There simply are no general answers to be had here. "When in doubt, use a mutex" :-)Jacquenette
N
13

From https://docs.python.org/3/faq/library.html#what-kinds-of-global-value-mutation-are-thread-safe:

A global interpreter lock (GIL) is used internally to ensure that only one thread runs in the Python VM at a time. In general, Python offers to switch among threads only between bytecode instructions; how frequently it switches can be set via sys.setswitchinterval(). Each bytecode instruction and therefore all the C implementation code reached from each instruction is therefore atomic from the point of view of a Python program.

The following operations are all atomic (L, L1, L2 are lists, D, D1, D2 are dicts, x, y are objects, i, j are ints):

L.append(x)
L1.extend(L2)
x = L[i]
x = L.pop()
L1[i:j] = L2
L.sort()
x = y
x.field = y
D[x] = y
D1.update(D2)
D.keys()

These aren’t:

i = i+1
L.append(L[-1])
L[i] = L[j]
D[x] = D[x] + 1

Above is purely CPython specific and can vary across different Python implemenation such as PyPy.

By the way there is an open issue for documenting atomic Python operations - https://bugs.python.org/issue15339

Nestornestorian answered 8/7, 2016 at 12:13 Comment(3)
The OP also asked about other Python implementations and I'd be interested to know that too... Does the GIL work identically for all these operations on all implementations (that actually have a GIL)?Ponce
Thank you for these links. Issue 15339 is 4 years old - it does not seem that anyone is in a hurry to document anything about thread-safety in Python. This fundamental property of the language remains an implementation detail of its reference implementation.Lethbridge
Hi, I am from the future. 11 years later, we are still waiting for this documentation: github.com/python/cpython/issues/59544Tichonn

© 2022 - 2024 — McMap. All rights reserved.