Are system calls on Windows inherently slower than Linux?
Asked Answered
M

2

10

My understanding of system calls is that in Linux the system call mechanism (int 0x80 or whatever) is documented and guaranteed to be stable across different kernel versions. Using this information, the system calls are implemented directly in the CRT library, so that when I call e.g. printf("a"); this involves a single function call to the CRT, where the system call is set up and activated. In theory this can be improved further by statically compiling the CRT (not common on Linux, but a possibility) so that even the single function call may be inlined.

On the other hand, Windows does not document or even guarantee consistency of the system call mechanism. The only way to make a system call on Windows is to call into ntdll.dll (or maybe some other *.dll) which is done from the CRT, so there are two function calls involved. If the CRT is used statically and the function gets inlined (slightly more common on Windows than Linux) we still have the single function call into ntdll.dll that we can't get rid of.

So it seems to me that theoretically system calls on Windows will be inherently slower since they always have to do one function call more than their Linux equivalents. Is this understanding (and my explanation above) true?

Note: I am asking this purely theoretically. I understand that when doing a system call (which I think always involves 2 context switches - one in each direction) the cost of an extra function call is probably completely negligible.

Manysided answered 27/2, 2014 at 12:50 Comment(9)
You could refer to this link.Portable
CRT? What CRT? I don't use C for Windows apps. My Delphi apps import the OS DLL's and call them. Besides, printf() on Windows? How many Windows apps use printf calls? 99.99% have a GUI, not some jurassic 'terminal console' interface from the 60's.Monasticism
@MartinJames CRT if you are using C (or probably also most interpreted languages). In your case it would be the Delphi runtime, whatever that is. And printf was an example. Any call to open a file or allocate heap memory or write to the screen or a bunch of toher things does system callsManysided
@Martin James: +1. Good comment. Baruch: What the hell you mix CRT with kernel. What do you mean. Question is unclear. Voted to close.Monafo
@Monafo The question is perfectly clear to me at least, perhaps it's your understanding that is clouded? Where, do you think, does this question mix up calls into CRT and calls into kernel space? As far as I can tell, it carefully distinguishes the two all the way though.Integumentary
I would correct "theoretically system calls on Windows will be inherently slower" to "theoretically runtime library calls on Windows will be inherently slower". With that change your reasoning is sound, though as you say the difference is marginal. Note that the distinction I'm making is important, as many Windows programmers do not use the runtime library much, tending to prefer direct calls to the Win32 API.Agnola
@HarryJohnston Your wording would still not be accurate, since many runtime library calls don't make any system calls (e.g. math library, conversions) and so are unaffected by this question.Manysided
@HarryJohnston Also, many Linux programmers that care about the extra speed will directly invoke the system call, bypassing the runtime library, and so the reasoning still holds.Manysided
It's very difficult to understand what this question is asking, and why it has a bounty. It's an apples and pears comparison.Matless
T
17

On IA-32 there are two ways to make a system call:

  • using int/iret instructions
  • using sysenter/sysexit instructions

Pure int/iret based system call takes 211 CPU cycles (and even much more on modern processors). Sysenter/sysexit takes 46 CPU ticks. As you can see execution of only a pair of instructions used for system call introduces significant overhead. But any system call implementation involves some work on the kernel side (setup of kernel context, dispatching of the call and its arguments etc.). More or less realistic highly optimized system call will take ~250 and ~100 CPU cycles for int/iret and sysenter/sysexit based system calls respectively. In Linux and Windows it will take ~500 ticks.

In the same time, function call (based on call/ret) have a cost of 2-4 tics + 1 for each argument.

As you can see, overhead introduced by function call is negligible in comparision to the system call cost.

On other hand, if you embed raw system calls in your application, you will make it highly hardware dependent. For example, what if your application with embedded sysenter/sysexit based raw system call will be executed on old PC without these instructions support? In addition your application will be sensitive for system call call convention used by OS.

Such libraries like ntdll.dll and glibc are commonly used, because they provide well-known and hardware independent interface for the system services and hides details of the communication with kernel behind the scene.

Linux and Windows have approximately the same cost of system calls if use the same way of crossing the user/kernel space border (difference will be negligible). Both trying to use fastest way possible on each particular machine. All modern Windows versions starting at least from Windows XP are prepared for sysenter/sysexit. Some old and/or specific versions of Linux can still use int/iret based calls. x64 versions of OSes relies to syscall/sysret instructions which works like the sysenter/sysexit and available as part of AMD64 instructions set.

Teirtza answered 12/3, 2014 at 23:44 Comment(3)
+1. Unrelated: I've learned that "imbedded" is a wordAnia
Corrected to the more classic spelling. Thanks for the useful comment!Teirtza
Exactly. The difference is not only negligible, but completely swamped by the enormous differences between the way the two operating systems actually implement the various operations.Agnola
K
1

I'm not sure about Linux, but here's the full breakdown of a syscall on Windows 10 (64-bit). As you can see there's a lot that is happening there, just to name a few:

  • Security mitigations against Meltdown and Speculative Execution Side Channel attacks.
  • Setup of the Indirect Branch Prediction Barrier. (Especially a huge cost of doing IBPB mitigation in software - see the example in the link I gave.)
  • Setup for the retpoline.
  • SMAP, Supervisor-Mode Access Prevention setup.
  • All the things above but during the exit from the syscall. So x2.
  • Saving and restoring of the SSE registers, and MXCSR register.
  • Copying of the user-mode stack into the kernel.
  • Conversion of the system service number for the SYSCALL into the service function pointer in the kernel.

And this is for a single trip to the kernel for a single system function.

It'd be interesting to read the breakdown for Linux though, but something tells me that it is probably very similar.

Keithakeithley answered 25/8, 2023 at 8:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.