Benefits of user-level threads
Asked Answered
I

2

8

I was looking at the differences between user-level threads and kernel-level threads, which I basically understood.
What's not clear to me is the point of implementing user-level threads at all.

If the kernel is unaware of the existence of multiple threads within a single process, then which benefits could I experience?
I have read a couple of articles that stated user-level implementation of threads is advisable only if such threads do not perform blocking operations (which would cause the entire process to block).

This being said, what's the difference between a sequential execution of all the threads and a "parallel" execution of them, considering they cannot take advantage of multiple processors and independent scheduling?

An answer to a previously asked question (similar to mine) was something like:

No modern operating system actually maps n user-level threads to 1 kernel-level thread.

But for some reason, many people on the Internet state that user-level threads can never take advantage of multiple processors.

Could you help me understand this, please?

Implode answered 2/1, 2016 at 18:38 Comment(14)
Possible duplicate of Difference between user-level and kernel-supported threads?Valente
You can have multiple OS threads backing an even larger number of user-mode threads. That way you can use the full CPU and still do your own scheduling.Glazed
Despite of your "premise", you seems to mix user threads, which are created by the user, but can be implemented by the OS/kernel, and user-level threads, which are implemented enirely in the user space. Accepted answer to the related question also clarifies this difference.Valente
@Tsyvarev, That was one of the questions i looked at, but it didn't clarify my doubtsImplode
@Valente And by the way, i'm not mixing anything. Maybe some people around the web do, since i'm not interested in user threads but rather user-level threads. All of the things i talk about are the result of me looking for user-level threadsImplode
That was one of the questions i looked at, but it didn't clarify my doubts - so you want someone else you trust to post the same answer? Or what? Answer you cite in your question refers to that question, doesn't it? If so, that answer isn't accepted and has only 1 upvote - this doesn't mean, that it is right answer.Valente
Exactly, and it just so happens that that question is not the one you're considering a duplicate of mine. Anyway, since you are skeptical about that answer too, can you point me to the right direction or provide me a clarification? Or are you just willing that someone closes this thread and marks it as a duplicate?Implode
OK, most questions answered except the last two. Under construction. Hope it's helpful... put quite some effort into it 'cause don't have anything better to do. ^^Zaria
Why did you put "parallel" into double quotes in "[...] what's the difference between a sequential execution of all the threads and a 'parallel' execution of them, considering they cannot make advantage of multiple processors and independent scheduling?"Zaria
@cad Thanks a lot for your concern, i will narrowly read your answer and let you know EDIT: "parallel" because they will never be able to run simultaneously, since the scheduler sees only processes, not threadsImplode
If possible, could you add hyperlinks to text where you claim to have something from an off-site resource, please? Like "many people on the Internet state" or "I have read a couple of articles." It would greatly improve your post due to increased credibility.Zaria
Read your edit. Then what do you mean with "sequential" exactly?Zaria
I will surely add a link for each of those. With "sequential" i mean an actual sequential computation of threads. If user-level threads cannot take advantage of multiple processors, why not simply running thread1, then thread2, ... then threadN? This way we could even avoid the extra work of programming thread switchings and scheduling algorithmsImplode
@anfri: Question doesn't stop to be duplicate if only words those answer didn't clarify my doubts are added. If you don't trust that answer, you may request reference to sources which you are trust: standard, specification, definition. With such request your question will no more longer be a duplicate. But that request should be explicitely stated in your question, instead of could you help to understand.Valente
Z
12

I strongly recommend Modern Operating Systems 4th Edition by Andrew S. Tanenbaum (starring in shows such as the debate about Linux; also participating: Linus Torvalds). Costs a whole lot of bucks but it's definitely worth it if you really want to know stuff. For eager students and desperate enthusiasts it's great.

Your questions answered

[...] what's not clear to me is the point of implementing User-level threads at all.

Read my post. It is comprehensive, I daresay.

If the kernel is unaware of the existence of multiple threads within a single process, then which benefits could I experience?

Read the section "Disadvantages" below.

I have read a couple of articles that stated that user-level implementation of threads is advisable only if such threads do not perform blocking operations (which would cause the entire process to block).

Read the subsection "No coordination with system calls" in "Disadvantages."


All citations are from the book I recommended in the top of this answer, Chapter 2.2.4, "Implementing Threads in User Space."

Advantages

Enables threads on systems without threads

The first advantage is that user-level threads are a way to work with threads on a system without threads.

The first, and most obvious, advantage is that a user-level threads package can be implemented on an operating system that does not support threads. All operating systems used to fall into this category, and even now some still do.

No kernel interaction required

A further benefit is the light overhead when switching threads, as opposed to switching to the kernel mode, doing stuff, switching back, etc. The lighter thread switching is described like this in the book:

When a thread does something that may cause it to become blocked locally, for example, waiting for another thread in its process to complete some work, it calls a run-time system procedure. This procedure checks to see if the thread must be put into blocked state. If, so it stores the thread’s registers (i.e., its own) [...] and reloads the machine registers with the new thread’s saved values. As soon as the stack pointer and program counter have been switched, the new thread comes to life again automatically. If the machine happens to have an instruction to store all the registers and another one to load them all, the entire thread switch can be done in just a handful of in- structions. Doing thread switching like this is at least an order of magnitude—maybe more—faster than trapping to the kernel and is a strong argument in favor of user-level threads packages.

This efficiency is also nice because it spares us from incredibly heavy context switches and all that stuff.

Individually adjusted scheduling algorithms

Also, hence there is no central scheduling algorithm, every process can have its own scheduling algorithm and is way more flexible in its variety of choices. In addition, the "private" scheduling algorithm is way more flexible concerning the information it gets from the threads. The number of information can be adjusted manually and per-process, so it's very finely-grained. This is because, again, there is no central scheduling algorithm needing to fit the needs of every process; it has to be very general and all and must deliver adequate performance in every case. User-level threads allow an extremely specialized scheduling algorithm.
This is only restricted by the disadvantage "No automatic switching to the scheduler."

They [user-level threads] allow each process to have its own customized scheduling algorithm. For some applications, for example, those with a garbage-collector thread, not having to worry about a thread being stopped at an inconvenient moment is a plus. They also scale better, since kernel threads invariably require some table space and stack space in the kernel, which can be a problem if there are a very large number of threads.


Disadvantages

No coordination with system calls

The user-level scheduling algorithm has no idea if some thread has called a blocking read system call. OTOH, a kernel-level scheduling algorithm would've known because it can be notified by the system call; both belong to the kernel code base.

Suppose that a thread reads from the keyboard before any keys have been hit. Letting the thread actually make the system call is unacceptable, since this will stop all the threads. One of the main goals of having threads in the first place was to allow each one to use blocking calls, but to prevent one blocked thread from affecting the others. With blocking system calls, it is hard to see how this goal can be achieved readily.

He goes on that system calls could be made non-blocking but that would be very inconvenient and compatibility to existing OSes would be drastically hurt.
Mr Tanenbaum also says that the library wrappers around the system calls (as found in glibc, for example) could be modified to predict when a system cal blocks using select but he utters that this is inelegant.

Building upon that, he says that threads do block often. Often blocking requires many system calls. And many system calls are bad. And without blocking, threads become less useful:

For applications that are essentially entirely CPU bound and rarely block, what is the point of having threads at all? No one would seriously propose computing the first n prime numbers or playing chess using threads because there is nothing to be gained by doing it that way.

Page faults block per-process if unaware of threads

The OS has no notion of threads. Therefore, if a page fault occurs, the whole process will be blocked, effectively blocking all user-level threads.

Somewhat analogous to the problem of blocking system calls is the problem of page faults. [...] If the program calls or jumps to an instruction that is not in memory, a page fault occurs and the operating system will go and get the missing instruction (and its neighbors) from disk. [...] The process is blocked while the necessary instruction is being located and read in. If a thread causes a page fault, the kernel, unaware of even the existence of threads, naturally blocks the entire process until the disk I/O is complete, even though other threads might be runnable.

I think this can be generalized to all interrupts.

No automatic switching to the scheduler

Since there is no per-process clock interrupt, a thread acquires the CPU forever unless some OS-dependent mechanism (such as a context switch) occurs or it voluntarily releases the CPU.
This prevents usual scheduling algorithms from working, including the Round-Robin algorithm.

[...] if a thread starts running, no other thread in that process will ever run unless the first thread voluntarily gives up the CPU. Within a single process, there are no clock interrupts, making it impossible to schedule processes round-robin fashion (taking turns). Unless a thread enters the run-time system of its own free will, the scheduler will never get a chance.

He says that a possible solution would be

[...] to have the run-time system request a clock signal (interrupt) once a second to give it control, but this, too, is crude and messy to program.

I would even go on further and say that such a "request" would require some system call to happen, whose drawback is already explained in "No coordination with system calls." If no system call then the program would need free access to the timer, which is a security hole and unacceptable in modern OSes.

Zaria answered 2/1, 2016 at 20:38 Comment(8)
Again, thanks a lot. You were extremely kind posting such a detailed answer. I have a couple of questions for you: 1) Below section "No Kernel interaction required" you name a machine. What you mean is the physical machine? If yes, does it mean that run-time system in user space needs to perform some sort of system call to change the values of machine registers?Implode
2) If Mr.Tanenmbaum himself states that re-implementing system calls as non-blocking or writing a wrapper that makes use of UNIX's select is not the most elegant solution, then how do we practically solve the issue of blocking system calls in user-level threads?Implode
@Implode The machine refers to the whole computer, so it's the physical machine. The machine registers are just the CPU registers. It just means that a kernel-level thread implementation needs to jump to kernel mode to switch threads. There the scheduler is called and the registers are saved and stored in some kernel structure. In opposite, a user-level implementation does not require such kernel interaction. It can save and store the registers on its own and calls its own scheduler. It's not about changing register values but about preserving their values when switching to another thread.Zaria
@Implode Concerning your second question: he also states that there isn't much choice left, so I think you should go with either of these methods. However, I'll search a bit, maybe I can find a user-level thread implementation and find out how it handles that issue.Zaria
Who's the subject in "it stores the thread’s registers..."? And further, what is "this" referring to in the sentence "doing thread switching like this..."? You just said that a kernel-level implementation jumps to kernel mode to switch between threads, storing registers into some kernel structure, so I thought that "kernel" could be the subject in that sentence. But if that was the case, then it'd be still not clear to me what "doing thread switching like this..." refers to.Implode
@Implode I edited the quotation somewhat, so it becomes clear what "it" means. "like this" refers to the thread switching without kernel interaction. I added a sentence before the citation to emphasize that.Zaria
You should clarify somewhere around this quote "For applications that are essentially entirely CPU bound and rarely block, what is the point of having threads at all?" that it only applies to single-core systems, or to user-space threading. Also: Often blocking requires many system calls.: I think you mean avoiding blocking when wrapping system calls that can potentially block. Did you mention POSIX O_NONBLOCK? Open files with that, then you can just make read(2) calls and yield if it returns EWOULDBLOCK. (I'm not really arguing for user-space threads, though!)Mccue
Regarding multiprocessing, quoting from 'Operating Systems' by William Stallings " In a pure ULT strategy, a multithreaded application cannot take advantage of multiprocessing. A kernel assigns one process to only one processor at a time. Therefore, only a single thread within a process can execute at a time. In effect, we have application-level multiprogramming within a single process. While this multiprogramming can result in a significant speedup of the application, there are applications that would benefit from the ability to execute portions of code simultaneously."Sheepdip
H
2

What's not clear to me is the point of implementing user-level threads at all.

User-level threads largely came into the mainstream due to Ada and its requirement for threads (tasks in Ada terminology). At the time, there were few multiprocessor systems and most multiprocessors were of the master/slave variety. Kernel threads simply did not exist. User threads had to be created to implement languages like Ada.

If the kernel is unaware of the existence of multiple threads within a single process, then which benefits could I experience?

If you have kernel threads, threads multiple threads within a single process can run simultaneously. In user threads, the threads always execute interleaved.

Using threads can simplify some types of programming.

I have read a couple of articles that stated user-level implementation of threads is advisable only if such threads do not perform blocking operations (which would cause the entire process to block).

That is true on Unix and maybe not all unix implementations. User threads on many operating systems function perfectly fine with blocking I/O.

This being said, what's the difference between a sequential execution of all the threads and a "parallel" execution of them, considering they cannot take advantage of multiple processors and independent scheduling?

In user threads. there is never parallel execution. In kernel threads, the can be parallel execution IF there are multiple processors. On a single processor system, there is not much advantage to using kernel threads over single threads (contra: note the blocking I/O issue on Unix and user threads).

But for some reason, many people on the Internet state that user-level threads can never take advantage of multiple processors.

In user threads, the process manages its own "threads" by interleaving execution within itself. The process can only have a thread run in the processor that the process is running in.

If the operating system provides system services to schedule code to run on a different processor, user threads could run on multiple processors.

I conclude by saying that for practicable purposes there are no advantages to user threads over kernel threads. There are those that will assert that there are performance advantages, but for there to be such an advantage it would be system dependent.

Hersch answered 3/1, 2016 at 3:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.