What is the relationship between NVIDIA MPS (Multi-Process Server) and CUDA Streams?
Asked Answered
H

1

5

Glancing from the official NVIDIA Multi-Process Server docs, it is unclear to me how it interacts with CUDA streams.

Here's an example:

App 0: issues kernels to logical stream 0;

App 1: issues kernels to (its own) logical stream 0.

In this case,

1) Does / how does MPS "hijack" these CUDA calls? Does it have full knowledge of , for each application, what streams are used and what kernels are in which streams?

2) Does MPS create its own 2 streams, and place the respective kernels into the right streams? Or does MPS potentially enable kernel concurrency via mechanisms other than streams?

If it helps, I'm interested in how MPS work on Volta, but information with respect to older architecture is appreciated as well.

Horrific answered 7/3, 2018 at 23:35 Comment(4)
From a programmer's perspective, there is no relationship. They are orthogonal. You're unlikely to get a precise description of how MPS works under the hood, as that information is not published anywhere, and is subject to change. These don't really strike me as programming questions, anyway.Inhalant
Thanks for your response. Is it fair to say that NVIDIA has not published information about neither (1) nor (2)?Horrific
Yes, that is my view. I can provide an answer if you want, but it will contain a lot of "this isn't published or specified"Inhalant
Anything will be helpful, Robert. As we speak, I found that slides 21 and onwards in this NVIDIA deck does hint at the use of multiple streams.Horrific
I
9

A way to think about MPS is that it acts as a funnel for CUDA activity, emanating from multiple processes, to take place on the GPU as if they emanated from a single process. One of the specific benefits of MPS is that it is theoretically possible for kernel concurrency even if the kernels emanate from separate processes. The "ordinary" CUDA multi-process execution model would serialize such kernel executions.

Since kernel concurrency in a single process implies that the kernels in question are issued to separate streams, it stands to reason that conceptually, MPS is treating the streams from the various client processes as being completely separate. Naturally, then, if you profile such a MPS setup, the streams will show up as being separate from each other, whether they are separate streams associated with a single client process, or streams across several client processes.

In the pre-Volta case, MPS did not guarantee process isolation between kernel activity from separate processes. In this respect, it was very much like a funnel, taking activity from several processes and issuing it to the GPU as if it were issued from a single process.

In the Volta case, activity from separate processes behaves from an execution standpoint (e.g. concurrency, etc.) as if it were from a single process, but activity from separate processes still carry process isolation (e.g. independent address spaces).

1) Does / how does MPS "hijack" these CUDA calls? Does it have full knowledge of , for each application, what streams are used and what kernels are in which streams?

Yes, CUDA MPS understands separate streams from a given process, as well as the activity issued to each, and maintains such stream semantics when issuing work to the GPU. The exact details of how CUDA calls are handled by MPS are unpublished, to my knowledge.

2) Does MPS create its own 2 streams, and place the respective kernels into the right streams? Or does MPS potentially enable kernel concurrency via mechanisms other than streams?

MPS maintains all stream activity, as well as CUDA stream semantics, across all clients. Activity issued into a particular CUDA stream will be serialized. Activity issued to independent streams may possibly run concurrently. This is true regardless of the origin of the streams in question, be they from one process or several.

Inhalant answered 8/3, 2018 at 3:54 Comment(1)
Thanks for your answer!Horrific

© 2022 - 2024 — McMap. All rights reserved.