I thought I had the grasp of this but apparently I do not:) I need to perform parallel H.264 stream encoding with NVENC from frames that are not in any of the formats accepted by the encoder so I have a following code pipeline:
- A callback informing that a new frame has arrived is called
- I copy the frame to CUDA memory and perform the needed color space conversions (only the first
cuMemcpy
is synchronous, so I can return from the callback, all pending operations are pushed in a dedicated stream) - I push an event onto the stream and have another thread waiting for it, as soon as it is set I take the CUDA memory pointer with the frame in the correct color space and feed it to the decoder
For some reason I had the assumption that I need a dedicated context for each thread if I perform this pipeline in parallel threads. The code was slow and after some reading I understood that the context switching is actually expensive, and then I actually came to the conclusion that it makes no sense since in a context owns the whole GPU so I lock out any parallel processing from other transcoder threads.
Question 1: In this scenario am I good with using a single context and an explicit stream created on this context for each thread that performs the mentioned pipeline?
Question 2: Can someone enlighten me on what is the sole purpose of the CUDA device context? I assume it makes sense in a multiple GPU scenario, but are there any cases where I would want to create multiple contexts for one GPU?