Green-threads and thread in Python
Asked Answered
D

2

60

As Wikipedia states:

Green threads emulate multi-threaded environments without relying on any native OS capabilities, and they are managed in user space instead of kernel space, enabling them to work in environments that do not have native thread support.

Python's threads are implemented as pthreads (kernel threads), and because of the global interpreter lock (GIL), a Python process only runs one thread at a time.

[QUESTION] But in the case of Green-threads (or so-called greenlet or tasklets),

  1. Does the GIL affect them? Can there be more than one greenlet running at a time?
  2. What are the pitfalls of using greenlets or tasklets?
  3. If I use greenlets, how many of them can a process can handle? (I am wondering because in a single process you can open threads up to ulimit(-s, -v) set in your *ix system.)

I need a little insight, and it would help if someone could share their experience, or guide me to the right path.

Diagram answered 6/10, 2012 at 10:28 Comment(2)
The answer to all three is "it depends on the greenlet implementation".Krystinakrystle
Stackless Python gets into a lot of these concepts. I recommend getting a version and doing the tutorial on the official site. It has a lot of explination about the sorts of questions you are asking.Latium
W
43

You can think of greenlets more like cooperative threads. What this means is that there is no scheduler pre-emptively switching between your threads at any given moment - instead your greenlets voluntarily/explicitly give up control to one another at specified points in your code.

Does the GIL affect them? Can there be more than one greenlet running at a time?

Only one code path is running at a time - the advantage is you have ultimate control over which one that is.

What are the pitfalls of using greenlets or tasklets?

You need to be more careful - a badly written greenlet will not yield control to other greenlets. On the other hand, since you know when a greenlet will context switch, you may be able to get away with not creating locks for shared data-structures.

If I use greenlets, how many of them can a process can handle? (I am wondering because in a single process you can open threads up to umask limit set in your *ix system.)

With regular threads, the more you have the more scheduler overhead you have. Also regular threads still have a relatively high context-switch overhead. Greenlets do not have this overhead associated with them. From the bottle documentation:

Most servers limit the size of their worker pools to a relatively low number of concurrent threads, due to the high overhead involved in switching between and creating new threads. While threads are cheap compared to processes (forks), they are still expensive to create for each new connection.

The gevent module adds greenlets to the mix. Greenlets behave similar to traditional threads, but are very cheap to create. A gevent-based server can spawn thousands of greenlets (one for each connection) with almost no overhead. Blocking individual greenlets has no impact on the servers ability to accept new requests. The number of concurrent connections is virtually unlimited.

There's also some further reading here if you're interested: http://sdiehl.github.io/gevent-tutorial/

Winton answered 20/5, 2013 at 1:48 Comment(5)
web.archive.org/web/20160304020253/www.devmusings.com/blog/2013/…Winton
Thanks for putting all the info together, I think with information provided here one can move ahead quickly. Thanks @MartinDiagram
@MartinKonecny this might not be the best place to ask but, is the statement "Only one code path is running at a time" valid for all user threads (this is same as greenlets, right?) or is it just valid for python?Intrusive
upvoted! so lets say i had 10 green threads performing 10 different joins between the same 2 sqlite tables A and B, you are saying they will run sequentially?Volkslied
They will run sequentially, yes. As long as you are yielding each greenlet after each queryWinton
H
8

I assume you're talking about evenlet/gevent greenlets

1) There can be only one greenlet running

2) It's cooperative multithreading, which means that if a greenlet is stuck in an infinite loop, your entire program is stuck, typically greenlets are scheduled either explicitly or during I/O

3) A lot more than threads, it depends of the amount of RAM available

Hog answered 6/10, 2012 at 12:27 Comment(8)
So you are saying that only advantage of using greenlet is you can have more "threads" than real threads.Diagram
I'm not sure, but I think it's faster to switch between greenlets than it is to switch OS threads because they are lighter but don't quote me on thatHog
Green-threads have about the same cost as calling a function, while multi-threading need context switching (saving the whole thread state in memory, load the context of a new thread until looping over). These two method don't belong to the same scale of overhead (and processes are even worse).Abseil
@Abseil I believe you are conflating a process context switch with a thread context switch. Threads do not write them selves to memory or need to load when switching between threads of the same process -- they exist inside and share the same process memory space.Landel
@Abseil all that get switched for threads is the program counter, processor registers and the stack pointer. Thats much smaller then the full memory address space. That's one of the primary reasons the concept of threads exist instead of everything just being processes.Landel
@Landel you're perfectly right. I did not intend to address this level of detail. Actually multiple factors play against each other, though threads have something that green threads don't (they can live all at the same time thanks to multi-core, hyper-threading etc.), we could hope green-threads with high-enough frequency token hand-over to get a little better performance than threads sharing the same physical resource...Abseil
@Landel ...because of the green threads belonging to the same execution thread and, consequently, the prediction-, prefetch- and cache-based optimization being optimized over them all, while threads optimizations are more handled upon an independent basis. Isn't this the reason why event loops are said faster than thread pools ?Abseil
Thread pools that execute tasks are less prone to suffer from context switching and make use of all the CPU cores. See. NET framework async/await features. Locks have to be taken on shared data but mostly be avoided.Ultramicrometer

© 2022 - 2024 — McMap. All rights reserved.