GPU memory leak with MSDK using D3D11 for rendering (sample_decode)

GPU memory leak with MSDK using D3D11 for rendering (sample_decode)

We're using Intel MSDK to decode H.264 video streams in our software, rendered with D3D11/DXGI. We discovered when stopping and starting streams we had a memory leak in GPU memory (observed primarily through ProcessExplorer/"System GPU memory"). We verified with DX debug layer that all objects were released correctly, and then we turned to checking the functionality in "sample_decode" to verify if the problems were in our implementation. 

Unfortunately (?) the memory issues can be seen even with sample_decode, with minor changes. The only changes done in code are:
- Added a for-loop to run the same decoder/render task 5 times instead of only 1 time.
- Ignoring the result of the RegisterClass call (to not abort since class already registered)
Code is attached, along with a compiled debug .exe.

* sample_decode based on "2018 R2" samples
* GPU memory issues seems to happen when rendering with D3D11
* Using D3D9 no memory leak could be observed
* Tested on multiple machines, but results below were running on a Skylake processor (HD 530)
 

D3D11 rendered
h264 -hw -d3d11 -r -async 4 -rgb4 -i c:\temp\bbb_sunflower_1080p_30fps_normal.mp4.264
=> GPU-memory increasing

D3D11 NOT rendered
h264 -hw -d3d11 -async 4 -rgb4 -i c:\temp\bbb_sunflower_1080p_30fps_normal.mp4.264
=> GPU-memory returns to 0 between each run

D3D9 rendered
h264 -hw -d3d -r -async 4 -rgb4 -i c:\temp\bbb_sunflower_1080p_30fps_normal.mp4.264
=> GPU-memory returns to 0 between each run

D3D11 software rendered
h264 -sw -d3d11 -r -async 4 -rgb4 -i c:\temp\bbb_sunflower_1080p_30fps_normal.mp4.264
=> GPU-memory increasing

 

See screenshots from Process Explorer below. 

D3D11 - rendering

D3D11 - not rendering

D3D9 - rendering

 

Could somebody from Intel look into this issue? Could it be something that may need to be handled differently in the code to mitigate this issue? Something not released correctly?  Obviously sample_decode is built to run once, but looking into how to handle opening/closing decoding streams with MSDK and D3D11 we would hope that the sample would at least initiate and close everything correctly anyway.

 

Best Regards,
Carl

AttachmentSize
Downloadapplication/zip sample_decode.zip14.16 MB
15 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Below is the output of Media System Analyzer:

 

Intel(R) Media Server Studio 2017 - System Analyzer (64-bit)


The following versions of Media SDK API are supported by platform/driver
[opportunistic detection of MSDK API > 1.20]:

        Version Target  Supported       Dec     Enc
        1.0     HW      Yes             X       X
        1.0     SW      Yes             X       X
        1.1     HW      Yes             X       X
        1.1     SW      Yes             X       X
        1.2     HW      Yes             X       X
        1.2     SW      Yes             X       X
        1.3     HW      Yes             X       X
        1.3     SW      Yes             X       X
        1.4     HW      Yes             X       X
        1.4     SW      Yes             X       X
        1.5     HW      Yes             X       X
        1.5     SW      Yes             X       X
        1.6     HW      Yes             X       X
        1.6     SW      Yes             X       X
        1.7     HW      Yes             X       X
        1.7     SW      Yes             X       X
        1.8     HW      Yes             X       X
        1.8     SW      Yes             X       X
        1.9     HW      Yes             X       X
        1.9     SW      Yes             X       X
        1.10    HW      Yes             X       X
        1.10    SW      Yes             X       X
        1.11    HW      Yes             X       X
        1.11    SW      Yes             X       X
        1.12    HW      Yes             X       X
        1.12    SW      Yes             X       X
        1.13    HW      Yes             X       X
        1.13    SW      Yes             X       X
        1.14    HW      Yes             X       X
        1.14    SW      Yes             X       X
        1.15    HW      Yes             X       X
        1.15    SW      Yes             X       X
        1.16    HW      Yes             X       X
        1.16    SW      Yes             X       X
        1.17    HW      Yes             X       X
        1.17    SW      Yes             X       X
        1.18    HW      Yes             X       X
        1.18    SW      Yes             X       X
        1.19    HW      Yes             X       X
        1.19    SW      Yes             X       X
        1.20    HW      Yes             X       X
        1.20    SW      Yes             X       X
        1.21    HW      Yes             X       X
        1.21    SW      Yes             X       X
        1.22    HW      Yes             X       X
        1.22    SW      Yes             X       X
        1.23    HW      Yes             X       X
        1.23    SW      Yes             X       X
        1.24    HW      Yes             X       X
        1.24    SW      Yes             X       X
        1.25    HW      Yes             X       X
        1.25    SW      Yes             X       X
        1.26    HW      Yes             X       X
        1.26    SW      Yes             X       X
        1.27    HW      Yes             X       X
        1.27    SW      Yes             X       X

Graphics Devices:
        Name                                         Version             State
        Intel(R) HD Graphics 530                     25.20.100.6444      Running / Full Power
        NVIDIA GeForce GTX 970                       25.21.14.1616       Running / Full Power

System info:
        CPU:    Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz
        OS:     Microsoft Windows 10 Pro
        Arch:   64-bit

Installed Media SDK packages (be patient...processing takes some time):
        Intel(R) Media SDK 2018 R2 - HEVC GPU accelerated Encoder
        Intel(R) Media SDK 2018 R2 - Media Samples
        Intel(R) Media Server Studio 2017 - Video Quality Caliper
        Intel(R) Media SDK 2018 R2 - Software Development Kit
        Intel(R) Media SDK 2018 R2 - Documentation for HEVC
        Intel(R) Media SDK 2018 R2 - HEVC SW Encoder
        Samples for Intel(R) Media SDK 2017 for Windows*
        Intel(R) Media SDK 2018 R2 - HEVC SW Decoder

Installed Media SDK DirectShow filters:

Installed Intel Media Foundation Transforms:
        Intel(R) Hardware M-JPEG Decoder MFT : {00C69F81-0524-48C0-A353-4DD9D54F9A6E}

 

 

Hi,

Has anybody looked at this at all?
I can mention that our software is primarily used in industrial environments, where stability is key. Because of this issue we are now considering using other hardware decoding solutions than Intel, which would be unfortunate since it otherwise seems promising.

If any more information is needed, I'll be happy to provide it.

 

Best Regards,
Carl

Hi Carl,

Sorry for the late response, I have looked at your description and I can reproduce the issue.

This is the memory management issue which app or library didn't clean up the GPU memory for each run, I didn't see the memory increase during each run, it only happens between stop and start running.

I have submitted an investigation request to dev team and will keep you updated.

Mark

Hi Mark,

Ok, good. Correct, it seems that when closing a decoding session, not all memory is released correctly, so after too many video stream switches this causes our application to crash. (Our application is switching video streams by command of operators or automatically by )

Best Regards,
Carl

Any news with this issue?

Best Regards,
Carl

At some point, sample_decode has been updated to use CComPtr instead of raw pointers (which is a good thing). Instead of using the operator* to dereference the COM pointer, the code dereference the pointer directly (preventing the leak detection to work).

If you replace the line in sample_common/d3d11_device.cpp, line 262:

hres = m_pSwapChain->GetBuffer(0, __uuidof( ID3D11Texture2D ), (void**)&m_pDXGIBackBuffer.p);

with 

hres = m_pSwapChain->GetBuffer(0, __uuidof( ID3D11Texture2D ), (void**)&m_pDXGIBackBuffer);

You get an assert (in debug) indicating that the COM pointer was leaked.

Pascal

Hi! Thanks for the response, but I'm not sure I follow all the way.

Do you mean that the texture (m_pDXGIBackBuffer) is leaking? If so, wouldn't it be leaked either once (per running session), or once per frame? Neither case seems to be true as far as I understand, as the leaked memory is too large for once, and too small for once per frame.

We're not using CComPtr in our code (legacy reasons where we explicitly want to allocate/deallocate), so I'm not too familiar with CComPtr. What would you suggest as the solution for the sample in this case? (If you have time to answer)

Best Regards,
Carl

 

Hello Carl,

I'm not too sure about the size of the leak since there are a some resources which are leaked per frame. Also I did not try to understand how you measured the memory leak. I just remembered from the original sample, there was a few COM leaks around and apparently they are still here. To detect the leak, in d3d11_device.cpp, replace all the "&pointer.p" with "&pointer" and to fix it, place appropriate pointer.Release() call before dereferencing the pointer. I've tried your sample and I've had to do it in 3 places (m_pInputViewLeft, m_pOutputView and m_pDXGIBackBuffer.) 

Cheers,
Pascal

Hello Carl,

As mentioned in your original message, I've used Process Explorer to confirm that the memory leak disappears while rendering after fixing the COM leaks.

Pascal

Hi Carl,

Sorry for the late response, I have submitted the issue and the dev team is investigating it.

I just check the status and looks like they don't have progress yet. I will try to push.

About Pascal's suggestion, it seems like a work around but not direct hit the bug, would it be a clue to investigate your problem?

Mark

Pascal & Mark,

I'll look into this. I hope this would be a clue and provide a solution, but I'm not fully convinced yet since I feel the problem would be either more or less memory leaked than what is actually leaked. And in our code we're not using CComPtr, and using the D3D11 Debug Layer we can not see any leaked D3D resources (using ReportLiveDeviceObjects), although maybe we still could have a similar problem like Pascal mentions which the Debug Layer can not pick up? Anyway, I hope this could provide a clue and answer!

I'll get back when I've been able to look into it!

Best Regards,
Carl

Hi,

I can confirm this seems to solve the issue in the sample! So we can hopefully do some digging and find out why our own code (based on another/older sample) behaves like it does. 

I'll update as soon as I have more information.

Best Regards,
Carl

Pascal, thanks for this.  Can you confirm what the mitigating action is and where, precisely you have fixed it? I would like to put this fix into my own codebase too.  I'm guessing checking and releasing the pointer before use is the thing to do.

 

Hi Carl,

If you want my help, you can post how did you fix it. I can do a history check at least, I could also tell the dev team to speed up the investigation.

Mark

Leave a Comment

Please sign in to add a comment. Not a member? Join today