How to stream H.264 video over UDP using the NVidia NVEnc hardware encoder?
Asked Answered
R

1

11

This is going to be a self-answered question, because it has driven me nuts over the course of a full week and I wish to spare fellow programmers the frustration I went through.

The situation is this: you wish to use NVidia's NVEnc hardware encoder (available on Kepler and Maxwell cards, i.e. GT(x) 7xx and GT(x) 9xx, respectively) to stream the output of your graphics application via UDP. This is not a trivial path to take, but it can be very efficient as it circumvents the need to "download" frames from video memory to system memory until after the encoding stage, because NVEnc has the ability to access video memory directly.

I had already managed to make this work insofar as to generate a .h264 file by simply writing NVEnc's output buffers to it, frame after frame. VLC had no trouble playing such a file, except that the timing was off (I didn't try to fix this, as I only needed that file for debugging purposes).

The problem came when I tried to stream the encoded frames via UDP: neither VLC nor MPlayer were able to render the video. It turned out there were two reasons for that, which I'll explain in my answer.

Rhett answered 17/10, 2015 at 11:13 Comment(0)
R
17

Like I said in the question, there were two (well, actually three) reasons MPlayer couldn't play my UDP stream.

The first reason has to do with packetizing. NVEnc fills its output buffers with data blocks called NALUs, which it separates with "start codes" mainly intended for bitstream synchronization. (Go to szatmary's excellent SO answer if you wish to learn more about Annex B - and its competitor AVCC).

The problem now is that NVEnc sometimes delivers more than one such NALU in a single output buffer. Although most NALUs contain encoded video frames, it is sometimes necessary (and mandatory at the beginning of a stream) to send some metadata as well, like the resolution, framerate etc.. NVEnc helps with that by generating those special NALUs as well (more on that further down).

As it turns out, player software however does not support getting more than one NALU in a single UDP packet. This means that you have to program a simple loop that looks for start codes (two or three "0" bytes followed by a "1" byte) to chop up the output buffer and send each NALU in its own UDP packet. (Note however that the UDP packets must still include those start codes.)

Another problem with packetization is that IP packets quite generally cannot exceed a certain size. Again, a SO answer provides valuable insight into what those limits are in various contexts. The important thing here is that while you do not have to handle this yourself, you do have to tell NVEnc to "slice" its output, by setting the following parameters when creating the encoder object:

m_stEncodeConfig.encodeCodecConfig.h264Config.sliceMode = 1;
m_stEncodeConfig.encodeCodecConfig.h264Config.sliceModeData = 1500 - 28;

(with m_stEncodeConfig being the parameter struct that will be passed to NvEncInitializeEncoder(), 1500 being the MTU of Ethernet packets, and 28 being the added sizes of an IP4 header and a UDP header).

The second reason why MPlayer couldn't play my stream has to do with the nature of streaming video as opposed to storing it in a file. When player software starts playing a H.264 file, it will find the required metadata NALUs containing the resolution, framerate etc., store that info and thus never need it again. Whereas when asked to play a stream, it will have missed the beginning of that stream and cannot begin to play until the sender re-sends the metadata.

And here's the problem: unless told otherwise, NVEnc will only ever generate the metadata NALUs at the very beginning of an encoding session. Here is the encoder configuration parameter that needs to be set:

m_stEncodeConfig.encodeCodecConfig.h264Config.repeatSPSPPS = 1;

This tells NVEnc to re-generate SPS/PPS NALUs from time to time (I think that by default, this means with every IDR frame).

And voilà! With these hurdles cleared, you will be able to appreciate the power of generating compressed video streams while hardly taxing the CPU at all.

EDIT: I realize that this kind of ultra-simple UDP streaming is discouraged, as it does not really conform to any standard. Mplayer will play such a stream, but VLC, which is otherwise capable of playing almost anything, will not. The foremost reason is that there is nothing in the data stream that even indicates the type of the medium being sent (in this case, video). I am currently doing research to find the simplest way that will satisfy accepted standards.

Rhett answered 17/10, 2015 at 12:27 Comment(9)
If you're going the IETF standards way, you're looking at rfc3550 (RTP/UDP) + rfc6184 (H.264 payload format) You'll probably want to implement packetization-mode=0 (single NAL unit mode) since you've already configured the encoder according to the network MTU, although packetization-mode=1(non-interleaved) would be required if you wanted to aggregate/fragment NAL units across multiple RTP packets.Patrica
Thanks @Ralf. RFC6184 remains an option, but I've decided that for the time being UDP streaming is fine, as the stream will be generated and consumed by custom software.Rhett
@JpNotADragon - can you ping (email) me your contact info <mySOusername>@gmail.com, I wanted to bounce some questions off you about this subject (thanks!)Azotize
Any advice on how to chop up the UDP packets? I'm running into the exact same issues as you were. I don't think setting the slice mode in the coder actually helps, as I'm simply having nvenc write to a vector that I then read out as pure bytes.Schlieren
@HugoZink My own code no longer chops up the NALUs produced by NVENC, relying instead on the FFMPEG library. It takes the output of NVENC as it is (which can be one NALU or more), puts it into an AVPacket (involves assigning a timestamp) and sends that via av_write_frame(). FFMPEG takes care of everything else. Also, I found out since writing this answer that simple UDP may not be your best choice after all - packet loss was way worse than I ever expected. Just use TCP, unless you really want to implement your own error correction/management.Rhett
Yeah, thanks for the advice. I ended up using TCP which works a lot better. The H264 stream is still a bit choppy and artifact-y when played in VLC, but it will have to do. I don't think UDP would be suitable for a raw H264 stream anyway, if the packets arrive out of order then everything goes wrong.Schlieren
So is repeatSPSPPS needed only for streamed UDP case where some data may be lost and thus decoder would wait for IDR and thus need SPSPPS repeat to "init mid stream" or new viewer joining the stream post-start and thus needed init info? Or is there any reason to set repeatSPSPPS for like archive file (i.e. regular video file) that is later downloaded by browser for playback? (e.g., would it be needed to make content easily seekable, or it doesn't affect seekability of video file?)Enthusiasm
@LB2: good question, for which I unfortunately don't have a reliable answer. I don't think it would affect seekability, as SPS/PPS apparently have nothing to do with an index.Rhett
@JPNotADragon, I found the answer serendipitously when I seeked content in ffmpeg and got a bunch of non-existing PPS 1 referenced in output that continue for ever while content is playing. And of course SO had the answer that that's due to SPSPPS was missing. I confirmed that archived content containing repeatSPSPPS=1 gets less of these, and when it does, it's only a few, until apparently it does get to one at which point it's pacified.Enthusiasm

© 2022 - 2024 — McMap. All rights reserved.