Getting green screen in ffplay: Streaming desktop (DirectX surface) as H264 video over RTP stream using Live555
Asked Answered



I'm trying to stream the desktop(DirectX surface in NV12 format) as H264 video over RTP stream using Live555 & Windows media foundation's hardware encoder on Windows10, and expecting it to be rendered by ffplay (ffmpeg 4.2). But only getting a green screen like below,

enter image description here

enter image description here

enter image description here

enter image description here

I referred MFWebCamToRTP mediafoundation-sample & Encoding DirectX surface using hardware MFT for implementing live555's FramedSource and changing the input source to DirectX surface instead of webCam.

Here is an excerpt of my implementation for Live555's doGetNextFrame callback to feed input samples from directX surface:

virtual void doGetNextFrame()
    if (!_isInitialised)
        if (!initialise()) {
            printf("Video device initialisation failed, stopping.");
        else {
            _isInitialised = true;

    //if (!isCurrentlyAwaitingData()) return;

    DWORD processOutputStatus = 0;
    HRESULT mftProcessOutput = S_OK;
    IMFMediaBuffer *pBuffer = NULL;
    IMFSample *mftOutSample = NULL;
    DWORD mftOutFlags;
    bool frameSent = false;
    bool bTimeout = false;

    // Create sample
    CComPtr<IMFSample> videoSample = NULL;

    // Create buffer
    CComPtr<IMFMediaBuffer> inputBuffer;
    // Get next event
    CComPtr<IMFMediaEvent> event;
    HRESULT hr = eventGen->GetEvent(0, &event);
    CHECK_HR(hr, "Failed to get next event");

    MediaEventType eventType;
    hr = event->GetType(&eventType);
    CHECK_HR(hr, "Failed to get event type");

    switch (eventType)
    case METransformNeedInput:
            hr = MFCreateDXGISurfaceBuffer(__uuidof(ID3D11Texture2D), surface, 0, FALSE, &inputBuffer);
            CHECK_HR(hr, "Failed to create IMFMediaBuffer");

            hr = MFCreateSample(&videoSample);
            CHECK_HR(hr, "Failed to create IMFSample");
            hr = videoSample->AddBuffer(inputBuffer);
            CHECK_HR(hr, "Failed to add buffer to IMFSample");

            if (videoSample)

                CHECK_HR(videoSample->SetSampleTime(mTimeStamp), "Error setting the video sample time.\n");
                CHECK_HR(videoSample->SetSampleDuration(VIDEO_FRAME_DURATION), "Error getting video sample duration.\n");

                // Pass the video sample to the H.264 transform.

                hr = _pTransform->ProcessInput(inputStreamID, videoSample, 0);
                CHECK_HR(hr, "The resampler H264 ProcessInput call failed.\n");

                mTimeStamp += VIDEO_FRAME_DURATION;


    case METransformHaveOutput:

            CHECK_HR(_pTransform->GetOutputStatus(&mftOutFlags), "H264 MFT GetOutputStatus failed.\n");

            if (mftOutFlags == MFT_OUTPUT_STATUS_SAMPLE_READY)
                MFT_OUTPUT_DATA_BUFFER _outputDataBuffer;
                memset(&_outputDataBuffer, 0, sizeof _outputDataBuffer);
                _outputDataBuffer.dwStreamID = outputStreamID;
                _outputDataBuffer.dwStatus = 0;
                _outputDataBuffer.pEvents = NULL;
                _outputDataBuffer.pSample = nullptr;

                mftProcessOutput = _pTransform->ProcessOutput(0, 1, &_outputDataBuffer, &processOutputStatus);

                if (mftProcessOutput != MF_E_TRANSFORM_NEED_MORE_INPUT)
                    if (_outputDataBuffer.pSample) {

                        //CHECK_HR(_outputDataBuffer.pSample->SetSampleTime(mTimeStamp), "Error setting MFT sample time.\n");
                        //CHECK_HR(_outputDataBuffer.pSample->SetSampleDuration(VIDEO_FRAME_DURATION), "Error setting MFT sample duration.\n");

                        IMFMediaBuffer *buf = NULL;
                        DWORD bufLength;
                        CHECK_HR(_outputDataBuffer.pSample->ConvertToContiguousBuffer(&buf), "ConvertToContiguousBuffer failed.\n");
                        CHECK_HR(buf->GetCurrentLength(&bufLength), "Get buffer length failed.\n");
                        BYTE * rawBuffer = NULL;

                        fFrameSize = bufLength;
                        fDurationInMicroseconds = 0;
                        gettimeofday(&fPresentationTime, NULL);

                        buf->Lock(&rawBuffer, NULL, NULL);
                        memmove(fTo, rawBuffer, fFrameSize);



                        frameSent = true;
                        _lastSendAt = GetTickCount();


                    if (_outputDataBuffer.pEvents)




    if (!frameSent)
        envir().taskScheduler().triggerEvent(eventTriggerId, this);



    printf("MediaFoundationH264LiveSource doGetNextFrame failed.\n");
    envir().taskScheduler().triggerEvent(eventTriggerId, this);

Initialise method:

bool initialise()
    HRESULT hr;
    D3D11_TEXTURE2D_DESC desc = { 0 };

    HDESK CurrentDesktop = nullptr;
    CurrentDesktop = OpenInputDesktop(0, FALSE, GENERIC_ALL);
    if (!CurrentDesktop)
        // We do not have access to the desktop so request a retry
        return false;

    // Attach desktop to this thread
    bool DesktopAttached = SetThreadDesktop(CurrentDesktop) != 0;
    CurrentDesktop = nullptr;
    if (!DesktopAttached)
        printf("SetThreadDesktop failed\n");

    UINT32 activateCount = 0;

    // h264 output
    MFT_REGISTER_TYPE_INFO info = { MFMediaType_Video, MFVideoFormat_H264 };

    UINT32 flags =

    // ------------------------------------------------------------------------
    // Initialize D3D11
    // ------------------------------------------------------------------------

    // Driver types supported
    D3D_DRIVER_TYPE DriverTypes[] =
    UINT NumDriverTypes = ARRAYSIZE(DriverTypes);

    // Feature levels supported
    D3D_FEATURE_LEVEL FeatureLevels[] =
    UINT NumFeatureLevels = ARRAYSIZE(FeatureLevels);

    D3D_FEATURE_LEVEL FeatureLevel;

    // Create device
    for (UINT DriverTypeIndex = 0; DriverTypeIndex < NumDriverTypes; ++DriverTypeIndex)
        hr = D3D11CreateDevice(nullptr, DriverTypes[DriverTypeIndex], nullptr,
            FeatureLevels, NumFeatureLevels, D3D11_SDK_VERSION, &device, &FeatureLevel, &context);
        if (SUCCEEDED(hr))
            // Device creation success, no need to loop anymore

    CHECK_HR(hr, "Failed to create device");

    // Create device manager
    UINT resetToken;
    hr = MFCreateDXGIDeviceManager(&resetToken, &deviceManager);
    CHECK_HR(hr, "Failed to create DXGIDeviceManager");

    hr = deviceManager->ResetDevice(device, resetToken);
    CHECK_HR(hr, "Failed to assign D3D device to device manager");

    // ------------------------------------------------------------------------
    // Create surface
    // ------------------------------------------------------------------------
    desc.Format = DXGI_FORMAT_NV12;
    desc.Width = surfaceWidth;
    desc.Height = surfaceHeight;
    desc.MipLevels = 1;
    desc.ArraySize = 1;
    desc.SampleDesc.Count = 1;

    hr = device->CreateTexture2D(&desc, NULL, &surface);
    CHECK_HR(hr, "Could not create surface");

    hr = MFTEnumEx(
    CHECK_HR(hr, "Failed to enumerate MFTs");

    CHECK(activateCount, "No MFTs found");

    // Choose the first available encoder
    activate = activateRaw[0];

    for (UINT32 i = 0; i < activateCount; i++)

    // Activate
    hr = activate->ActivateObject(IID_PPV_ARGS(&_pTransform));
    CHECK_HR(hr, "Failed to activate MFT");

    // Get attributes
    hr = _pTransform->GetAttributes(&attributes);
    CHECK_HR(hr, "Failed to get MFT attributes");

    // Unlock the transform for async use and get event generator
    hr = attributes->SetUINT32(MF_TRANSFORM_ASYNC_UNLOCK, TRUE);
    CHECK_HR(hr, "Failed to unlock MFT");

    eventGen = _pTransform;
    CHECK(eventGen, "Failed to QI for event generator");

    // Get stream IDs (expect 1 input and 1 output stream)
    hr = _pTransform->GetStreamIDs(1, &inputStreamID, 1, &outputStreamID);
    if (hr == E_NOTIMPL)
        inputStreamID = 0;
        outputStreamID = 0;
        hr = S_OK;
    CHECK_HR(hr, "Failed to get stream IDs");

     // ------------------------------------------------------------------------
    // Configure hardware encoder MFT
   // ------------------------------------------------------------------------
    CHECK_HR(_pTransform->ProcessMessage(MFT_MESSAGE_SET_D3D_MANAGER, reinterpret_cast<ULONG_PTR>(deviceManager.p)), "Failed to set device manager.\n");

    // Set low latency hint
    hr = attributes->SetUINT32(MF_LOW_LATENCY, TRUE);
    CHECK_HR(hr, "Failed to set MF_LOW_LATENCY");

    hr = MFCreateMediaType(&outputType);
    CHECK_HR(hr, "Failed to create media type");

    hr = outputType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
    CHECK_HR(hr, "Failed to set MF_MT_MAJOR_TYPE on H264 output media type");

    hr = outputType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_H264);
    CHECK_HR(hr, "Failed to set MF_MT_SUBTYPE on H264 output media type");

    CHECK_HR(hr, "Failed to set average bit rate on H264 output media type");

    hr = MFSetAttributeSize(outputType, MF_MT_FRAME_SIZE, desc.Width, desc.Height);
    CHECK_HR(hr, "Failed to set frame size on H264 MFT out type");

    hr = MFSetAttributeRatio(outputType, MF_MT_FRAME_RATE, TARGET_FRAME_RATE, 1);
    CHECK_HR(hr, "Failed to set frame rate on H264 MFT out type");

    hr = outputType->SetUINT32(MF_MT_INTERLACE_MODE, 2);
    CHECK_HR(hr, "Failed to set MF_MT_INTERLACE_MODE on H.264 encoder MFT");

    CHECK_HR(hr, "Failed to set MF_MT_ALL_SAMPLES_INDEPENDENT on H.264 encoder MFT");

    hr = _pTransform->SetOutputType(outputStreamID, outputType, 0);
    CHECK_HR(hr, "Failed to set output media type on H.264 encoder MFT");

    hr = MFCreateMediaType(&inputType);
    CHECK_HR(hr, "Failed to create media type");

    for (DWORD i = 0;; i++)
        inputType = nullptr;
        hr = _pTransform->GetInputAvailableType(inputStreamID, i, &inputType);
        CHECK_HR(hr, "Failed to get input type");

        hr = inputType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
        CHECK_HR(hr, "Failed to set MF_MT_MAJOR_TYPE on H264 MFT input type");

        hr = inputType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_NV12);
        CHECK_HR(hr, "Failed to set MF_MT_SUBTYPE on H264 MFT input type");

        hr = MFSetAttributeSize(inputType, MF_MT_FRAME_SIZE, desc.Width, desc.Height);
        CHECK_HR(hr, "Failed to set MF_MT_FRAME_SIZE on H264 MFT input type");

        hr = MFSetAttributeRatio(inputType, MF_MT_FRAME_RATE, TARGET_FRAME_RATE, 1);
        CHECK_HR(hr, "Failed to set MF_MT_FRAME_RATE on H264 MFT input type");

        hr = _pTransform->SetInputType(inputStreamID, inputType, 0);
        CHECK_HR(hr, "Failed to set input type");



    CHECK_HR(_pTransform->ProcessMessage(MFT_MESSAGE_COMMAND_FLUSH, NULL), "Failed to process FLUSH command on H.264 MFT.\n");
    CHECK_HR(_pTransform->ProcessMessage(MFT_MESSAGE_NOTIFY_BEGIN_STREAMING, NULL), "Failed to process BEGIN_STREAMING command on H.264 MFT.\n");
    CHECK_HR(_pTransform->ProcessMessage(MFT_MESSAGE_NOTIFY_START_OF_STREAM, NULL), "Failed to process START_OF_STREAM command on H.264 MFT.\n");

    return true;


    printf("MediaFoundationH264LiveSource initialisation failed.\n");
    return false;

    HRESULT CheckHardwareSupport()
        IMFAttributes *attributes;
        HRESULT hr = _pTransform->GetAttributes(&attributes);
        UINT32 dxva = 0;

        if (SUCCEEDED(hr))
            hr = attributes->GetUINT32(MF_SA_D3D11_AWARE, &dxva);

        if (SUCCEEDED(hr))
            hr = attributes->SetUINT32(CODECAPI_AVDecVideoAcceleration_H264, TRUE);

#if defined(CODECAPI_AVLowLatencyMode) // Win8 only

        hr = _pTransform->QueryInterface(IID_PPV_ARGS(&mpCodecAPI));

        if (SUCCEEDED(hr))
            VARIANT var = { 0 };

            // FIXME: encoder only
            var.vt = VT_UI4;
            var.ulVal = 0;

            hr = mpCodecAPI->SetValue(&CODECAPI_AVEncMPVDefaultBPictureCount, &var);

            var.vt = VT_BOOL;
            var.boolVal = VARIANT_TRUE;
            hr = mpCodecAPI->SetValue(&CODECAPI_AVEncCommonLowLatency, &var);
            hr = mpCodecAPI->SetValue(&CODECAPI_AVEncCommonRealTime, &var);

            hr = attributes->SetUINT32(CODECAPI_AVLowLatencyMode, TRUE);

            if (SUCCEEDED(hr))
                var.vt = VT_UI4;
                var.ulVal = eAVEncCommonRateControlMode_Quality;
                hr = mpCodecAPI->SetValue(&CODECAPI_AVEncCommonRateControlMode, &var);

                // This property controls the quality level when the encoder is not using a constrained bit rate. The AVEncCommonRateControlMode property determines whether the bit rate is constrained.
                VARIANT quality;
                InitVariantFromUInt32(50, &quality);
                hr = mpCodecAPI->SetValue(&CODECAPI_AVEncCommonQuality, &quality);

        return hr;

ffplay command:

ffplay -protocol_whitelist file,udp,rtp -i test.sdp -x 800 -y 600 -profile:v baseline


o=- 0 0 IN IP4
s=No Name
t=0 0
c=IN IP4
m=video 1234 RTP/AVP 96
a=rtpmap:96 H264/90000
a=fmtp:96 packetization-mode=1

I don't know what am I missing, I have been trying to fix this for almost a week without any progress, and tried almost everything I could. Also, the online resources for encoding a DirectX surface as video are very limited.

Any help would be appreciated.

Postilion answered 22/10, 2019 at 11:50 Comment(8)
I think that you incorrectly expect the doGetNextFrame to be called again after METransformNeedInput. Maybe you should loop inside it until you get a valid ProcessOutput call.Evette
hr = event->GetType(&eventType); switch(eventType) {....} if (!frameSent) { envir().taskScheduler().triggerEvent(eventTriggerId, this); } The above 2 blocks nicely take care of calling ProcessInput until we get an output from the encoder. I have verified the same. @EvettePostilion
So what happens when frameSent is true? Do you trigger a new event in this case? You have a "return" statement after that.Evette
@Evette It's automatically called by the underlying live555 library in a loop. The "ProcessInput" & "ProcessOutput" are alternatively called based on the event in switch statement. I'm getting a continuous stream from ProcessOut, but not just be able to view it. I'm sure that I'm correctly setting the sample time and duration.Postilion
You may need to check whether you receive MF_E_TRANSFORM_STREAM_CHANGE from ProcessOutput and handle format change accordingly.Evette
@Evette thanks for pointing out this to me. I have added additional cases to capture events like MF_E_TRANSFORM_STREAM_CHANGE. But, I have not received the stream change event even once. I'm getting only two events (METransformNeedInput & METransformHaveOutput) that I mentioned in the original post itself.Postilion
Does this mean the encoder and renderer are working properly but the problem could be with the image source (DirectX surface)?Postilion
Yes it does. You can try decode it back and display it.Evette

It’s harder than it seems.

If you want to use the encoder as you’re doing, by calling IMFTransform interface directly, you have to convert RGB frames to NV12. If you want good performance, you should do it on GPU. Possible to do with pixel shaders, render 2 frames, full size one into DXGI_FORMAT_R8_UNORM render target with brightness, half-size into DXGI_FORMAT_R8G8_UNORM target with color, and write two pixel shaders to produce NV12 values. Both render targets can render into 2 planes of the same NV12 texture, but only since Windows 8.

Other method is use sink writer. It can host multiple MFTs at the same time so you can supply RGB textures in VRAM, the sink writer will first convert them into NV12 with one MFT (that's likely to be proprietary hardware one implemented by GPU driver, just like the encoder), then pass to encoder MFT. It’s relatively easy to encode into an mp4 file, use MFCreateSinkWriterFromURL API to create the writer. It’s much harder to get raw samples out of the sink writer however, you have to implement a custom media sink, custom stream sink for it’s video stream, and call MFCreateSinkWriterFromMediaSink to create the writer.

There’s more.

Regardless on the encoding methods, you can’t reuse frame textures. Each frame you get from DD, you should create a new texture and pass it to MF.

Video encoders expect constant frame rate. DD doesn’t give you that, it gives you a frame every time something changes on the screen. Can be 144 FPS if you have a gaming monitor, can be 2 FPS if the only change is blinking cursor. Ideally, you should submit frames to MF at constant frame rate, specified in your video media type.

If you want to stream to network, more often than not you have to also supply parameter sets. Unless you’re using Intel hardware h265 encoder which is broken with no comments from Intel, MF gives you that data in MF_MT_MPEG_SEQUENCE_HEADER attribute of media type, by calling SetCurrentMediaType on IMFMediaTypeHandler interface. You can implement that interface to get notified. You’ll only get that data after you start encoding. That's if you use a sink writer, for IMFTransform method it's easier, you should get MF_E_TRANSFORM_STREAM_CHANGE code from ProcessOutput method, then call GetOutputAvailableType to get the updated media type with that magic blob.

Searchlight answered 2/11, 2019 at 7:56 Comment(9)
you mean DirectX (Desktop duplication) doesn't deliver frames in NV12 format even when the device is intialized with D3D11_CREATE_DEVICE_VIDEO_SUPPORT & surface descriptor to DXGI_FORMAT_NV12 and setting MFT_MESSAGE_SET_D3D_MANAGER in transform? I too thought that we have to explicitly convert the RGB buffer to NV12 or any supported input format (Mostly variants of YUV) or use a SinkWriter . But, this person was able to achieve that somehow with my approach itself. #43433170Postilion
#43424729 & #56407325Postilion
@Ram Desktop duplication always delivers RGB frames in DXGI_FORMAT_B8G8R8A8_UNORM format. H264 and h265 encoder MFTs only support NV12 and couple others, equally weird ones. Someone has to convert. You use desktop duplication; you already can’t support Windows 7 with it. Use a sink writer. I’m pretty sure these nVidia’s / Intel’s hardware MFTs to convert RGB to NV12 are more power efficient than pixel shader ALUs, they probably implemented purely in hardware.Searchlight
You are right. Color conversion must be done explicitly. I'm proceeding in that direction.Postilion
I'm able to solve the rendering issue by doing color conversion using ID3D11VideoProcessor and able to achieve 50FPS on average for fullscreen updates and up to 85 FPS for non-fullscreen updates. Except for the fact that it works properly only when the screen content changes at a frame rate higher than FPS configured in the encoder. I'm going to check whether feeding the previous frame at a constant rate to the encoder works when there is no change in the display.Postilion
@Ram It should work, I did it before. When DD refuses to give you a new frame because there were no updates, you can save lots of VRAM by submitting same texture to the encoder again. Only create new textures when DD has a new frame for ya. But the code to detect when you should submit frames and how long to wait is not trivial. I have used QueryPerformanceCounter to measure time, and some kind of rolling average over the last few frames to find out should I capture, or should I sleep. BTW, the right way to sleep is IDXGIOutput::WaitForVBlank method.Searchlight
I was able to identify the problem. It's not due to the varying frame rate but was actually due to the encoder buffering until a GOP is filled (I guess, it buffers up to 30 frames). I've also tried setting CODECAPI_AVLowLatencyMode but nothing changed. Feeding the previous sample at a constant rate solves the problem, but I think it consumes a small amount of data even when there is no change in the screen content (Probably, it's due to keyframes).Postilion… This didn't work for me as defined ("one input sample shall produce one output sample")Postilion
Is there a way to force the encoder to produce one output sample per input? I tried CODECAPI_AVLowLatencyMode but that doesn't seem to be working as described (Even on Windows10). I'm forced to feed the encoder continuously with the previously generated sample I saved in memory, but that produces a sample of very big size(Equal to an I frame) every 3 seconds even when there is no change in the screen. I would be happy if I could at least increase the time between the produced I frames (as I guess) without a diff. I have experimented a lot, I've had no luck.Postilion

Since ffplay is complaining about the stream parameters, I would assume it can't pick up SPS/PPS. You haven't set them in your hardcoded SDP - see RFC-3984 and look for sprop-parameter-sets. A an example from the RFC:

m=video 49170 RTP/AVP 98
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42A01E;sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==

I strongly assume ffplay is expecting these in the SDP. I don't remember by heart how to get SPS/PPS from the media foundation encoder, but either the are in the sample payload and you need to extract them by looking up the proper NAL units or google how to extract the extra data from the encoder - the first hit I got looked promising.

Lewiss answered 30/10, 2019 at 20:6 Comment(3)
It's a valid point. I too have a suspect on SPS/PPS. I'm yet to verify it though. Thanks for directing me to the MSDN thread which gives me some hope.Postilion
@Ram there is a good chanche that SPS/PPS are in the sample payload, so I'd check that first.Lewiss
Yeah, I understand that. I have got some knowledge of retrieving and parsing SPS/PPS directly out of media foundation encoders when I tried writing samples to a file through Mpeg4MediaSink. I will move forward in this direction.Postilion

Soonts give you all necessary things to solve your problem.

The first thing you need to do, is format conversion between DXGI_FORMAT_B8G8R8A8_UNORM and MFVideoFormat_NV12 :

Format conversion

format conversion information

I think it's better to use shader to do format conversion, because all textures will stay in GPU (better for performance).

It's the first step you need to do. You will have others to improve your program.

Ezzell answered 4/11, 2019 at 0:36 Comment(2)
2x4 image takes 12 bytes in NV12 not 24: 8 brightness values which you have there, but the color image is twice as small, 1x2 pixels, so just 4 bytes in total for the color information of that 2x4 image, 2 bytes for U and 2 bytes for V.Searchlight
Yes you're right, i omitted the downsampling to 4.2.0 of the NV12 format. I will try to make a more fitting diagram.Ezzell

© 2022 - 2024 — McMap. All rights reserved.