Is Python (cpython) behavior with respect to memory barriers and atomicity etc. guaranteed?

Asked 24/7, 2020 at 17:42 Answered 24/7, 2020 at 19:57

Solved python multithreading volatile memory-barriers

I was wondering about the equivalent of Java's "volatile", and found this answer.

An equivalent to Java volatile in Python

Which (basically) says that everything is effectively volatile in python, at least in cpython, because of the GIL. Which makes sense, everything is locked by the GIL, no memory barriers to worry about, etc. But I would be happier if this were documented and guaranteed by specification, rather than have it be a result of the way that cpython happens to currently be implemented.

Because, say I want one thread to post data and others to read it, so I can choose something like this:

class XFaster:
    def __init__(self):
        self._x = 0

    def set_x(self, x):
        self._x = x

    def get_x(self, x):
        return self._x


class XSafer:
    def __init__(self):
        self._x = 0
        self._lock = threading.Lock()

    def set_x(self, x):
        with self._lock:
            self._x = x

    def get_x(self, x):
        with self._lock:
            return self._x

I'd rather go with XFaster or even not use a getter and setter at all. But I also want to do things reliably and "correctly". Is there some official documentation that says this is OK? What about say putting a value in a dict or appending to a list?

In other words, is there a systematic, documented way of determining what I can do without a threading.Lock (without digging through dis or anything like that)? And also preferably in a way that won't break with a future python release.

On edit: I appreciate the informed discussion in comments. But what I would really want is some specification that guarantees the following:

If I execute something like this:

# in the beginning
x.a == foo
# then two threads start

# thread 1:
x.a = bar

# thread 2
do_something_with(x.a)

I want to be sure that:

when thread 2 reads x.a it reads either foo or bar
if the read in thread 2 occurs physically later than the assignment in thread 1, then it actually reads bar

Here are some things I want not to happen:

the threads get scheduled on different processors, and the assignment x.a=bar from thread 1 isn't visible to the thread 2
x.__dict__ is in the middle of being re-hashed and so thread 2 reads garbage
etc

Elizabetelizabeth answered 24/7, 2020 at 17:42 Comment(27)

For objects, python documentation should describe which objects are and aren't thread safe. I don't believe lists are, for example, but queues are. Generally, accessing is thread safe, but setting is not. If you need to set something, grab the lock before setting then release. This is what actually makes setting safe. – Hence 24/7, 2020 at 17:47

I doubt you'll find any Python operation being atomic; everything is an object, not a fixed-width integer, so it's not possible for the only access to shared state to be a single pointer-width (or smaller) store. Even if it is just storing a pointer to an object, it would have to possibly destroy the previous object being overwritten, I think. (I don't really know Python). Also, more importantly, the Python interpreter very likely just used normal C operations, not atomic_store() from stdatomic.h, so concurrent write + read of the same object could trigger data-race UB in the interpreter. – Sliding 24/7, 2020 at 18:4

@PeterCordes - quite the opposite is true. CPython's byte code interpreter is single threaded, controlled by the Global Interpreter Lock. Any given byte code operation is run to completion while holding the lock and is thus atomic with respect to any other byte code operation. The exception is when the operation results in a call to an extension function (C most likely) that specifically releases the lock. – Bim 24/7, 2020 at 18:11

@Everyone these are good points, but this discussion illustrates why I want a documented specification that I can rely on, as opposed to deductions based on how cpython is implemented. – Elizabetelizabeth 24/7, 2020 at 18:24

@MZ - list methods themselves are thread safe because the list implementation never releases the GIL. You can append to a list without worrying. Augmented operations like "*=" are different because they use multiple byte codes to get the list, extend, and rebind the list to the variable. Python releases the GIL periodically to allow pseudo-multithreading and if that happens during this operation, and that other code reassigns the same variable, then this code will still rebind the old list object when it runs again. – Bim 24/7, 2020 at 18:30

The closest I found was [Name resolution of free variables occurs at runtime, not at compile time]( docs.python.org/3/reference/…). – Bim 24/7, 2020 at 18:32

@Bim that is my understanding. But I don't want to go digging through byte code everytime something like this comes up. Also, the fact that there's a GIL doesn't necessarily guarantee no memory issues. The processor could still schedule threads on different processors, and having a lock doesn't automatically imply that all data in processor B's cache makes it to main memory and is readable by processor A when it picks up the lock. I mean, OK, I'm sure the implementation of cpython does in fact guarantee this, but in other languages this stuff is specified in detail. – Elizabetelizabeth 24/7, 2020 at 18:33

@Bim yes, the actually methods might be safe, but the object itself is definitely not completely thread-safe. Indexing/Slicing will not be safe, on top of what you've stated. – Hence 24/7, 2020 at 18:42

@tdelaney: Ah right, atomicity via locking, of course. OP's discussion of Java volatile and memory-barriers had me thinking about lock-free atomics, which would probably require a huge overhaul of the internals of any current Python implementation, or a new one from the ground up. Ok, so software-visible threading.Lock() in Python only exists for higher-level things that take multiple Python statements to do. If the CPython interpreter is single-threaded, there's no concurrency? Or does Python threading exist for other Python implementations like Cython or PyPy to use? – Sliding 24/7, 2020 at 18:44

@tdelaney: sorry that's a bit off topic, I guess I should just google Python threading if I want to know what the point of it is, but I guess I was wondering if the question even made sense. I'm only seeing this because of the [memory-barriers] tag :P – Sliding 24/7, 2020 at 18:45

@PeterCordes - cpython is concurrent, just not parallel (within the confines of code honoring the GIL). lock-free atomics are interesting but I don't see how it could fit in existing python. Icrementing an integer is really creating a new integer object and binding it to the variable. List append may realloc the list. The GIL is the third rail of python. – Bim 24/7, 2020 at 18:54

@tdelaney: After a bit of googling, it seems that the point is to do stuff while a thread is waiting for I/O. There's no actual concurrency of two Python things happening at once, just keeping Python execution going while one or more I/O operations are outstanding. If that (no two Python statements can execute simultaneously) is documented in any Python language manual or standard, that would answer the OP's question I think: any single statement is atomic wrt. other threads. – Sliding 24/7, 2020 at 19:1

@tdelaney: BTW, in asm, pure load and pure store are normally atomic for free; languages like C mostly only need to make atomic load / store special because of memory ordering greater than "relaxed", and the as-if optimization rule. Only RMW atomicity (like incrementing an integer) is "hard" and needs special instructions. Lock-free atomicity without ability to do RMWs would be pretty limiting, though, and if anything needs to take a lock then all other operations on that object need to respect the lock. – Sliding 24/7, 2020 at 19:4

The closest statement I've found is in the C extention documentation: docs.python.org/3/c-api/… – Bim 24/7, 2020 at 19:10

@PeterCordes It doesn't quite answer the question. First, it depends on what the definition of "a single statement" is (e.g. is a,b=c,d a single statement?). But also, just because they don't happen at once doesn't automatically imply that the memory that thread A sees will reflect everything that happened in thread B, say if the os schedules the threads on different processors. It should, and almost certainly does work that way, but it would be nice if there were some documentation on python.org that said this explicitly. – Elizabetelizabeth 24/7, 2020 at 19:15

@mrip: A sane definition of "not happening at once" would include sufficient memory-ordering (e.g. via the GIL using standard acquire and release semantics) to make all visible effects of one critical section visible to the next critical section. For example, in ISO C and ISO C++, that's what "happens before" means, a synchronization relationship that can be established between threads via a mutex, or via atomics with release / acquire or stronger memory ordering. If a store can become visible after some load already read that location, the store didn't happen before it. – Sliding 24/7, 2020 at 19:28

But fair point about what's a single statement and what isn't. Doing a two-variable assignment as possibly a single "transaction" is a good example. – Sliding 24/7, 2020 at 19:28

Does this answer your question? Does new implementation of GIL in Python handled race condition issue? – Nasty 24/7, 2020 at 19:43

Can you please clarify exactly what you are asking? x.a=bar is not one well-defined operation – it is syntactic sugar for a method call that may run arbitrary code, including arbitrary many bytecode instructions or thread/process/task synchronisation operation. See property and object.__setattr__ for the most obvious causes of non-atomicity. – Nasty 24/7, 2020 at 19:48

@PeterCordes agree completely about a "sane definition". And yes, with C, C++, also Java this is all nicely documented. I guess with python, since there's a GIL and everything is fundamentally non-parallel, then "it just works." But even assignment isn't so simple in python. What x.a=5 does actually is it looks up the key "a" in the hash table x.__dict__ and places the value 5 there. And so while x.a=5 seems like it should be atomic, updating a hash table is not the sort of thing one generally assumes is automatically going to be atomic and thread-safe. – Elizabetelizabeth 24/7, 2020 at 19:51

What do you mean by “physically later”? Normally we don’t appeal to things like globally-accessible clocks when talking about multithreaded algorithms. – Ossify 24/7, 2020 at 19:52

@MisterMiyagi I am asking where I can find official documentation that tells me when I need to use an explicit lock and when I don't. As an example, in the classes I defined above, can I call set_x from one thread and get_x from other threads using XFaster or do I need to use XSafer to make sure that I don't run into any problems. And also I would like to read about this on python.org, not on some blog post. – Elizabetelizabeth 24/7, 2020 at 19:53

@DavisHerring By 'physically later' I mean that the sequence of bytecode instructions for the assignment finishes executing at an earlier time than the first bytecode instruction of the line in the other thread that reads the value. True, in general this wouldn't make sense, but since Python has the GIL, then only one bytecode instruction executes at a time, so it does make sense to talk about "earlier" and "later" with respect to a global clock. – Elizabetelizabeth 24/7, 2020 at 19:57

@mrip: Well, if you define your “later” in terms of the GIL, obviously everything is made consistent by construction (it wouldn’t be much of a lock otherwise). However, you started out by saying you didn’t want to appeal to implementation details, so I don’t see how that’s a useful definition. – Ossify 24/7, 2020 at 20:14

@DavisHerring The GIL is part of the official documentation, so it's more than just an implementation detail. But still, all the GIL does is guarantee that bytecode ops are atomic. That alone doesn't guarantee anything about memory visibility. It also doesn't guarantee anything about how python code maps to bytecode, or which kinds of objects can be left in inconsistent states at the end of bytecode ops, etc. In other words, without the GIL, the XFaster class above would definitely not be safe. But the GIL alone doesn't guarantee that it is safe. – Elizabetelizabeth 24/7, 2020 at 20:35

@mrip: I meant “CPython-specific implementation aspect”. It’s a reasonable concern; I was just interested in refining the “timing” aspect. – Ossify 24/7, 2020 at 20:37

@DavisHerring Yes, fair point. Ideally there would be an answer that doesn't refer to either the GIL or CPython, but at this point that seems like a pipe dream. But failing that, at least a situation like with SQL where at least each database has its own specification of the SQL dialect it uses. It's a bit strange that even for CPython, despite being one of the most widely used languages in the world, there isn't a formal spec that covers situations like this. – Elizabetelizabeth 25/7, 2020 at 11:25

TLDR: CPython guarantees that its own data structures are thread-safe against corruption. This does not mean that any custom data structures or code are race-free.

The intention of the GIL is to protect CPython's data structures against corruption. One can rely on the internal state being thread-safe.

global interpreter lock (Python documentation – Glossary)

The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time. This simplifies the CPython implementation by making the object model (including critical built-in types such as dict) implicitly safe against concurrent access. [...]

This also implies correct visibility of changes across threads.

However, this does not mean that any isolated statement or expression is atomic: Almost any statement or expression can invoke more than one bytecode instruction. As such the GIL does explicitly not provide atomicity for these cases.

In specific, a statement such as x.a=bar may execute arbitrary many bytecode instructions by invoking a setter via object.__setattr__ or the descriptor protocol. It executes at least three bytecode instructions for bar lookup, x lookup and a assignment.

As such, Python guarantees visibility/consistency, but provides no guarantees against race conditions. If an object is mutated concurrently, this must be synchronised for correctness.

Nasty answered 24/7, 2020 at 19:57 Comment(2)

Thanks. Accepting because it's probably as good as it gets. Would be nice if they just listed what the "critical built-in types" were (what about lists, sets, integers over 64 bits, multiple assignment, etc.). And yes, it does seem to imply correct memory visibility, although it wouldn't hurt for them to say so explicitly. I know it seems like I'm being a pain here, but really all I want is a specification so I can write code that I can be confident will work properly. – Elizabetelizabeth 24/7, 2020 at 20:7

@Elizabetelizabeth I understand the concern, as well as the frustration about lack of explicit specification. However, for all intends and purposes both Python and CPython make no guarantees about atomicity. Even the builtin types are just guaranteed not to get corrupted – but they are not guaranteed that their operations are atomic. Any set of concurrent operations on an object must be synchronised in order to be correct. – Nasty 24/7, 2020 at 20:16

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

global interpreter lock (Python documentation – Glossary)

Recommended topics

Hot tags