Python multi-thread multi-interpreter C API

Asked 26/9, 2014 at 13:54 Answered 3/6 at 23:8

Solved python multithreading python-c-api python-subinterpreters

I'm playing around with the C API for Python, but it is quite difficult to understand some corner cases. I could test it, but it seems a bug-prone and time consuming. So I come here to see if somebody has already done this.

The question is, which is the correct way to manage a multi-thread with sub-interpreters, with no direct relation between threads and sub-interpreters?

Py_Initialize();
PyEval_InitThreads(); /* <-- needed? */
_main = PyEval_SaveThread(); /* <-- acquire lock? does it matter? */
/* maybe do I not need it? */
i1 = Py_NewInterpreter();
i2 = Py_NewInterpreter();

Do I use a mutex? Is required to use locks? The threaded function should be something like the following: (The threads are non-python, probably POSIX threads)

Thread1

_save = PyThreadState_Swap(i1);
  // python work 
PyThreadState_Restore(_save);

Thread2 (almost identical)

_save = PyThreadState_Swap(i1);
  // python work 
PyThreadState_Restore(_save);

Thread3 (almost identical, but with the sub-interpreter i2)

_save = PyThreadState_Swap(i2);
  // python work 
PyThreadState_Restore(_save);

Is this correct? Is this the general case for which I want to achieve? Are there race conditions?

Thanks!

UPDATE: Since Python 3.12, creating new subinterpreters have extra options such as the possibility to give them their own GIL. See the What's New or directly the PEP684 for more details. I haven't looked into details, this changes may not affect this question at all, but it's something to keep in mind if you are a future reader that plan to take advantage of sub-interpreters for certain use cases.

Lurie answered 26/9, 2014 at 13:54 Comment(0)

Sub interpreters in Python are not well documented or even well supported. The following is to the best of my undestanding. It seems to work well in practice.

Threre are two important concepts to understand when dealing with threads and sub interpreters in Python. First, the Python interpreter is not really multi threaded. It has a Global Interpreter Lock (GIL) that needs to be acquired to perform almost any Python operation (there are a few rare exceptions to this rule).

Second, every combination of thread and sub interpreter has to have its own thread state. The interpreter creates a thread state for every thread managed by it, but if you want to use Python from a thread not created by that interpreter, you need to create a new thread state.

First you need to create the sub interpreters:

Initialize Python

Py_Initialize();

Initialize Python thread support

Required if you plan to call Python from multiple threads). This call also acquires the GIL.

PyEval_InitThreads();

Save the current thread state

I could have used PyEval_SaveThread(), but one of its side effects is releasing the GIL, which then needs to be reacquired.

PyThreadState* _main = PyThreadState_Get();

Create the sub interpreters

PyThreadState* ts1 = Py_NewInterpreter();
PyThreadState* ts2 = Py_NewInterpreter();

Restore the main interpreter thread state

PyThreadState_Swap(_main);

We now have two thread states for the sub interpreters. These thread states are only valid in the thread where they were created. Every thread that wants to use one of the sub interpreters needs to create a thread state for that combination of thread and interpreter.

Using a sub interpreter from a new thread

Here is an example code for using a sub interpreter in a new thread that is not created by the sub interpreter. The new thread must acquire the GIL, create a new thread state for the thread and interpretere combination and make it the current thread state. At the end the reverse must be done to clean up.

void do_stuff_in_thread(PyInterpreterState* interp)
{
    // acquire the GIL
    PyEval_AcquireLock(); 

    // create a new thread state for the the sub interpreter interp
    PyThreadState* ts = PyThreadState_New(interp);

    // make ts the current thread state
    PyThreadState_Swap(ts);

    // at this point:
    // 1. You have the GIL
    // 2. You have the right thread state - a new thread state (this thread was not created by python) in the context of interp

    // PYTHON WORK HERE

    // release ts
    PyThreadState_Swap(NULL);

    // clear and delete ts
    PyThreadState_Clear(ts);
    PyThreadState_Delete(ts);

    // release the GIL
    PyEval_ReleaseLock(); 
}

Using a sub interpreter from a new thread (post Python 3.3)

The previous do_stuff_in_thread() still works with all current Python versions. However, Python 3.3 deprecated PyEval_AcquireLock()/PyEval_ReleaseLock(), which resulted in a bit of a conundrum.

The only documented way to release the GIL is by calling PyEval_ReleaseThread() or PyEval_SaveThread(), both of which require a thread state, while cleaning and deleting the current thread state requires the GIL to be held. That means that one can either release the GIL or clean up the thread state, but not both.

Fortunately, there is a solution - PyThreadState_DeleteCurrent() deletes the current thread state and then releases the GIL. [This API has only been documented since 3.9, but it existed since Python 2.7 at least]

This modified do_stuff_in_thread() also works with all current Python versions.

void do_stuff_in_thread(PyInterpreterState* interp)
{
    // create a new thread state for the the sub interpreter interp
    PyThreadState* ts = PyThreadState_New(interp);

    // make it the current thread state and acquire the GIL
    PyEval_RestoreThread(ts);

    // at this point:
    // 1. You have the GIL
    // 2. You have the right thread state - a new thread state (this thread was not created by python) in the context of interp

    // PYTHON WORK HERE

    // clear ts
    PyThreadState_Clear(ts);

    // delete the current thread state and release the GIL
    PyThreadState_DeleteCurrent();
}

Now each thread can do the following:

Thread1

do_stuff_in_thread(ts1->interp);

Thread2

do_stuff_in_thread(ts1->interp);

Thread3

do_stuff_in_thread(ts2->interp);

Calling Py_Finalize() destroys all sub interpreters. Alternatively they can be destroyed manually. This needs to be done in the main thread, using the thread states created when creating the sub interpreters. At the end make the main interpreter thread state the current state.

// make ts1 the current thread state
PyThreadState_Swap(ts1);
// destroy the interpreter
Py_EndInterpreter(ts1);

// make ts2 the current thread state
PyThreadState_Swap(ts2);
// destroy the interpreter
Py_EndInterpreter(ts2);

// restore the main interpreter thread state
PyThreadState_Swap(_main);

I hope this make things a bit clearer.

I have a small complete example written in C++ on github, and another also on github (post Python 3.3 variant).

Selfheal answered 26/10, 2014 at 7:19 Comment(13)

Thanks! That's exactly what I was looking for, a "detailed quickstart" --more detailed than quick :). If it weren't for you, I I would have definitely missed the "a thread state for that combination of thread and interpreter". – Lurie 27/10, 2014 at 13:54

So to sum up: there is NO way you could use multiple python interpreter instances that are really parallel (hardware threads on multiple cores)? – Accident 8/7, 2015 at 13:50

Correct. Python uses a global interpreter lock that allows only a single thread to run actual Python code at a time. However, C code that performs long operation often releases the lock until it returns to Python, so that another thread can execute. This means that the actual utilization will depend on your code. – Selfheal 10/7, 2015 at 1:30

It's best to avoid the use of PyEval_AcquireLock and PyEval_ReleaseLock in favor of PyEval_RestoreThread and PyEval_SaveThread, as the former two are deprecated. – Oddity 21/4, 2016 at 9:9

As written, void do_stuff_in_thread(PyInterpreterState* interp) does not use interp anywhere, but instead uses a what appears to be a global ts1 variable. This makes the code in the post incorrect and hard to understand. I think PyThreadState* ts = PyThreadState_New(ts1->interp); is supposed to be PyThreadState* ts = PyThreadState_New(interp);, but confirmation would be appreciated. – Baranowski 28/6, 2018 at 21:45

Thanks for pointing this out. I am pretty sure you're correct (it agrees with my example at gist.github.com/sterin/61561c3139dd49da1f43 ) - but I'll need some time to test it. I'll update the answer soon. – Selfheal 28/6, 2018 at 21:57

This is a fascinating and helpful answer. Just to clarify, from what I'm reading in your text and the PEP 554: these sub-interpreters still share the GIL, so they still cannot run in parallel yet, correct? Thanks. – Afrikaans 21/9, 2018 at 15:17

@Afrikaans That's correct. Python has a single GIL, shared by all the threads ans sub-interpreters. Python code cannot run in parallel. – Selfheal 21/9, 2018 at 17:35

@Selfheal You might be able to answer this question since it looks to be based off your answer here? – Sellars 10/1, 2020 at 13:21

@Selfheal Can you please explain - in the Post3.3 variant is acquired GIL global for whole program or it is local to C/C++ thread? In other words is it possible anyhow in every C/C++ thread to have a separate Python interpreter, fully standalone? So that GIL will be local only within that interpreter. Why do I need that - I want anyhow to use all CPU cores in Python from single C/C++ process (program). Every interpreter (in each thread) will run its own code and acquire its own GIL, thus all C/C++ threads will be run on separate CPU Cores. Is it possible for current Python C API? – Cryptograph 14/1, 2021 at 12:22

@Cryptograph Unfortunately, Python sub interpreters all share the same GIL. They are there for convenience rather than performance. – Selfheal 14/1, 2021 at 23:3

@Selfheal I've heard that Python 3.10 or 3.11 started having fully capable sub-interpreters. They DON'T share any GIL or other state. Fully isolated. But within single Process, so these are thread-based interpreters. So now you can run just single Process and distribute sub-interpreters in threads on many cores of CPU. So it is replacement for multiprocessing approach which needed several Processes. Do you know about this new feature of 3.10/3.11? If so, maybe you can share some knowledge, maybe some link to a tutorial on how to use them? – Cryptograph 10/9, 2022 at 19:5

@Cryptograph As far as I can tell these are just proposals (PEP 554 and PEP 684), but admittedly I haven't been following it very closely. – Selfheal 10/9, 2022 at 22:56

just need to point out a problem in @sterin's answer, in part Using a sub interpreter from a new thread (post Python 3.3)

PyThreadState_New must be called when GIL is held
PyEval_RestoreThread will acquire the GIL, so it must not be called with GIL held, otherwise a deadlock will show up.

As a result, you need to use PyThreadState_Swap in this case instead of PyEval_RestoreThread

Also, you can verify which interpreter is being used by

int64_t interp_id = PyInterpreterState_GetID(interp);

Uppermost answered 20/7, 2021 at 22:16 Comment(1)

The GIL doesn't need to be held when calling PyThreadState_New - see docs – Tlingit 17/2, 2022 at 23:3

-1

I tried to use arbitrary number of threads and sub-interpreters in the sample from https://github.com/sterin/python-sub-interpreters-multiple-threads-example

// runs in a new thread
void f(PyInterpreterState* interp, int i)
{
    std::string code = R"PY(...)PY";

    sub_interpreter::thread_scope scope(interp);
    PyRun_SimpleString(code.c_str());
}

int main()
{
    initialize init;

    constexpr int THREADS_NUM = 3;

    // Create interpreters
    std::vector<sub_interpreter*> interpreters;
    for(int i = 0; i < THREADS_NUM; ++i)
        interpreters.emplace_back(new sub_interpreter());

    // Create working threads
    std::vector<std::thread> thread_group;
    for(int i = 0; i < THREADS_NUM; ++i)
        thread_group.emplace_back(std::thread(f, interpreters[i]->interp(), i));

    enable_threads_scope t;

    for (auto& thread: thread_group)
        thread.join();

    for(auto interp: interpreters)
        delete interp;

    return 0;
}

and got error:

Fatal Python error: drop_gil: drop_gil: GIL is not locked
Python runtime state: initialized

Current thread 0x000089f0 (most recent call first):
  <no Python frame>
Fatal Python error: 
drop_gil: GIL is not locked

UPDATE: Fixed by moving these lines to a nested block:

{
    enable_threads_scope t;
    for (auto& thread: thread_group)
        thread.join();
    }
}

Freeboot answered 3/6 at 23:8 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags