GIL behavior in python 3.7 multithreading

Asked 3/4, 2019 at 13:36 Answered 3/4, 2019 at 20:40

Solved python multithreading python-multithreading gil

I am researching and trying to understand the python GIL and best practices to use multithreading in python. I found this presentation and this video

I tried to reproduce the strange and crazy problems mentioned in the first 4 slides of the presentation. This problem also mentioned by the lecturer in the video(first 4 minutes). I wrote this simple code to reproduce the problem

from threading import Thread
from time import time

BIG_NUMBER = 100000
count = BIG_NUMBER


def countdown(n):
    global count
    for i in range(n):
        count -= 1


start = time()
countdown(count)
end = time()
print('Without Threading: Final count = {final_n}, Execution Time = {exec_time}'.format(final_n=count, exec_time=end - start))

count = BIG_NUMBER
a = Thread(target=countdown, args=(BIG_NUMBER//2,))
b = Thread(target=countdown, args=(BIG_NUMBER//2,))
start = time()
a.start()
b.start()
a.join()
b.join()
end = time()
print('With Threading: Final count = {final_n}, Execution Time = {exec_time}'.format(final_n=count, exec_time=end - start))

but the results are completely different from the paper and video! executing time with threading and without threading are almost the same. sometimes one of both case is a bit faster than the other.

here is a result I got using CPython 3.7.3 under Windows 10 using a multicore architecture processor.

Without Threading: Final count = 0, Execution Time = 0.02498459815979004
With Threading: Final count = 21, Execution Time = 0.023985862731933594

also what I understand according to the video and the paper is GIL prevent real parallel execution of two thread at the same time in two core. so if this is true, Why the final count variable (in multithreading case) is not zero as expected and will be a different number at the end of each execution probably because of manipulation of threads at the same time? does anything changes happen to GIL in newer pythons than the video and the paper(which use python 3.2) cause these different? thanks in advance

Agrippina answered 3/4, 2019 at 13:36 Comment(3)

I don't have expert knowledge of Python and the GIL, but my understanding is that scheduling of Python threads still is preemptive in spite of the GIL. count -= 1 is not an atomic operation, and the GIL does not prevent the system from switching from one thread to another half-way through performing it. I thought that the main impact of the GIL was that it never allows threads to truly run in parallel. – Picot 3/4, 2019 at 16:21

I just watched the video! Really great talk and awesome to see that you're trying to reproduce the results. Since the talk took place at PyCon 2010, I imagine David Beazly's results were done using Python version <= 2.6, judging by the information on the version dates in this link: python.org/doc/versions. I believe the GIL has gone through major improvements in Python 3.2 and probably subsequent versions as well. – Lunna 23/7, 2020 at 18:58

have you tried increasing the BIG NUMBER ? – Spiny 14/11, 2022 at 16:32

Python is not executed directly. It is first compiled into so called Python bytecode. This bytecode is similar in its idea to raw assembly. The bytecode is executed.

What GIL does it doesn't allow two bytecode instructions to run in parallel. Although some opeartions (e.g. io) do release the GIL internally to allow real concurrency when it can be proved that it cannot break anything.

Now all you have to know is that count -= 1 does not compile into a single bytecode instruction. It actually compiles into 4 instructions

LOAD_GLOBAL              1 (count)
LOAD_CONST               1 (1)
INPLACE_SUBTRACT
STORE_GLOBAL             1 (count)

which roughly means

load global variable into local variable
load 1 into local variable
subtract 1 from local variable
set global to the current local variable

Each of these instruction is atomic. But the order can be mixed by threads and that's why you see what you see.

So what GIL does it makes the execution flow serial. Meaning instructions happen one after another, nothing is parallel. So when you run multiple threads in theory they will perform the same as single thread minus some time spent on (so called) context switch. My tests in Python3.6 confirm that the execution time is similar.

However in Python2.7 my tests showed significant performance degradation with threads, about 1.5x. I don't know the reason for this. Something other then GIL has to happen in the background.

Early answered 3/4, 2019 at 20:40 Comment(4)

you and @aaron. thank you for the clarification. I got the point. but I still waiting for the answer to the first part of my question. why are the execution times of the two cases are the same? but in the presentation and the video got totally different execution times(with threading it run much more slower than the other case in the video). he ran it on python 3.2 and Mac and I run the code in python 3.7 and Windows. is anything changes from Python 3.2 to 3.7 or is it because of OS different schedulers or what? – Agrippina 3/4, 2019 at 20:53

@HamidRezaArzaghi I don't know about presentation. But since GIL locks all threads then they literally can step forward only one at time. Therefore the execution is serial, despite threads. So sequential code should run pretty much the same as threaded code. Threads will run slightly slower due to context switch though. But I don't think the difference is 2x, that looks like a serious exagerration. In case of two threads it probably wouldn't be detectable. But I'll look into it later. – Early 3/4, 2019 at 21:12

you are right and it should be slower in the world of Python and the GIL. but as I figured it out in the experimental code threaded case sometimes run even a bit faster! – Agrippina 3/4, 2019 at 21:21

@HamidRezaArzaghi modern machines are very complex beasts. The final result depends on so many things that happen not only in your code but on your entire machine as well. The execution speed will vary and if two algorithms theoreticall perform the same then they will. With one sometimes exceeding the other. That being said there seem to be some optimization done at least from Python 2.7 to 3.6. Threaded execution in 2.7 seems to be 1.5x slower. But I don't see the difference in 3.6. – Early 3/4, 2019 at 21:57

Regarding Solomon's comment, the reason the code you wrote gives inconsistent results is that python does not have atomic inplace operators. The GIL does protect python's internals from getting mixed up, but your user code does still have to protect itself. If we take a look at your countdown function using the dis module, we can see where the failure can occur.

>>> print(dis(countdown))
  3           0 SETUP_LOOP              24 (to 26)
              2 LOAD_GLOBAL              0 (range)
              4 LOAD_FAST                0 (n)
              6 CALL_FUNCTION            1
              8 GET_ITER
        >>   10 FOR_ITER                12 (to 24)
             12 STORE_FAST               1 (i)

  4          14 LOAD_GLOBAL              1 (count)
             16 LOAD_CONST               1 (1)
             18 INPLACE_SUBTRACT
             20 STORE_GLOBAL             1 (count)
             22 JUMP_ABSOLUTE           10
        >>   24 POP_BLOCK
        >>   26 LOAD_CONST               0 (None)
             28 RETURN_VALUE
None

The subtraction operation inside the loop actually takes 4 instructions to complete. If the thread is interrupted after 14 LOAD_GLOBAL 1 (count) but before the line 20 STORE_GLOBAL 1 (count), some other thread could come in and modify count. Then when execution is passed back to the first thread, the previous value of count is used for the subtraction and the result writes over whatever modification the other thread made. Like Solomon I'm not an expert on python's low level internals, but I do believe the GIL assures bytecode instructions are atomic, but not any further.

Digitalism answered 3/4, 2019 at 20:34 Comment(0)

Recommended topics

Hot tags