Memory Leak in MFXVideoENCODE_EncodeFrameAsync

Memory Leak in MFXVideoENCODE_EncodeFrameAsync

I'm seeing a slow memory leak when using mediaSDK to perform MPEG2 encoding.  I see this for both hardware/software with the attached code. In this code the main loops continually processes the same frames and contains no memory allocation.  Yet the memory used by the process continually creeps up.  The examples below show only 60 seconds of history, but the trend continues in the same manner until memory is exhausted.  This can take many hours, but eventually the process will consume all free memory in the machine.

For hardware, the example app generates a sequence like this on my system:

./msdkTest.exe
Using HW version 1.7
6 buffers suggested
Starting frame encode
Elapsed (s) DeltaMemory
          0      737280
         10     1368064
         20     2138112
         30     2625536
         40     3149824
         50     3710976
         60     4644864
 

Likewise for software, I see something like this:

./msdkTest.exe -s
Using SW version 1.8
3 buffers suggested
Starting frame encode
Elapsed (s) DeltaMemory
          0       24576
         10       36864
         20       49152
         30      118784
         40      184320
         50      253952
         60      286720

Am I doing something wrong in this simple loop?  I've run this same code on a difference machine with HW version 1.4 and SW version 1.7 and there was no leak so I believe my loop is OK.

 

AttachmentSize
Downloadtext/x-csrc msdkMem.c7.08 KB
16 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

From a quick read through your code I'm not seeing anything obviously wrong.  However, based on more tests with running long encodes I'm also seeing what could be slow memory leaks for the Windows HW mpeg2 and h264 implementations.  I'll investigate more and get back to you.  In the meantime, any additional details about driver version(s) and processor(s) tested on your end could help.

Thanks!

Jeff  

I'm built win32 application in vs2010 from attached C src and ran on two systems.

The first system had no memory leak using HW 1.4 and SW 1.7.  For this first system Media SDK System Analyzer gives:

Graphics Devices:

Name Version State

NVIDIA NVS 5400M 9.18.13.1269 08

Intel(R) HD Graphics 4000 9.17.10.2843 Active

System info:

CPU: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz

OS: Microsoft Windows 7 Enterprise

Arch: 64-bit

On the second system I'm running the same application and wind up using HW 1.7 and SW 1.8.

I'm definitely seeing memory usage grow for HW 1.7 but I'm no longer seeing any issue with the SW version.  Not sure what has changed.

Hardware log looks like as follows (pretty much same as previous post).

Using HW version 1.7
6 buffers suggested
Starting frame encode
Elapsed (s) DeltaMemory
          0      778240
         10     1302528
         20     2125824
         30     2580480
         40     3072000
         50     3756032
         60     4435968
         70     4870144

 

Here's what I see from the analyzer:

Graphics Devices:
        Name                                         Version             State
        Intel(R) HD Graphics 4600                    10.18.10.3345       Active
        VNC Mirror Driver                            1.8.0.0             08
        NVIDIA GeForce GTX 650                       9.18.13.1407        08

System info:
        CPU:    Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
        OS:     Microsoft Windows Embedded Standard
        Arch:   64-bit

 

 

Thanks!  This gives some important clues.

 

Jeff,

Any update on this leak.  For example have you been able to duplicate it? 

Do you have any suggestions for workarounds?  Currently I can use SW implementation for SD, but HD requires too many CPU cycles in my application.

I've replicated the leak and escalated to the appropriate teams.  We will get a fix out as soon as possible.  Don't know how acceptable these workarounds will be in your situation, but there are a few possibilities:

* Definitely understand that the memory leak is unacceptable for production, but is it possible to proceed with shorter tests that don't exhaust memory until a fix can be released?  
* Can development/testing requiring longer tests be done on 3rd Generation Core/Ivybridge?  I was only able to replicate the leak on  4th Generation Core/Haswell.

Hopefully this is very temporary and application development can continue.  Please let us know if this will affect any of your release deadlines.  If this is sensitive information please feel free to use a private message.

Regards, Jeff

 

I found out the same problem and i have allready reported it directly (PM) to Petter Larsson but i did not get any response so far. So I repost it here again now.

Meanwhile I discovered that the memory leak is only in the "system memory" implementation. There seemed to be no problem, when working with Direct3D surfaces (this might be a workaround for somebody, but be carefull, i experienced, that the performance is only 50% than the system memory approach. Sample_encode encodes (H264) ~300fps @ 25% CPU usage when using system memory, but only ~160fps @ 15%CPU usage when working with Direct3D11 surfaces (Direct3D9 surfaces are faster ~240fps), but this is another problem, i will report it, when the memory leak is fixed).

By the way, this reassures my assumption, that the memory leak appears in the

igfxcmrt64.dll CmQueue_RT::EnqueueCopyCPUToGPUFullStride(class CmSurface2D* __ptr64, ....)

that is show by "C++ memory leak detector".

Best regards

Carsten

Here is my first post to Petter:

When encoding 16-32 D1 streams (real time from live source) the "used memory" (shown in Windows Task Manager) of the process continously grows. After less than 1 day it reached > 8GB and windows becomes "out-of-memory".

After analyzing the problem it turns up, that the igfxcmrt64.dll caused the problem. But step by step:

The environment is (one of 3 different systems, which all shows the same behavior):

  • Intel Core i3 4330 (Haswell) with
  • Intel HD Graphics 4600
  • Memory 8GB
  • Windows 8.1 embedded Industry Pro
  • Intel Media SDK 2014

Tested with 3 diffrent graphic drivers (all shows the same problem):

  • Intel HD Graphics 4600 - 10.18.10.3316  10/1/2013  (comes with Win8.1)
  • Intel Iris and HD Driver 10.18.10.3345  10/31/2013 (from Intel Homepage 15.33.5.64.3345)
  • Intel Graphics           10.18.10.3355  11/15/2013 (installed via the Intel Driver Update Utility)

To keep things simple for you, i reproduced the bug with the Intel Media SDK 2014 sample_encode.exe.

It's very simple, just force the sample_encode to an endless loop. For this you just have to modify sample_utils.cpp in procedure mfxStatus CSmplYUVReader::LoadNextFrame(mfxFrameSurface1* pSurface) insert in line 138:

fseek(m_fSource, 0, 0);wd

The code looks now like this:
mfxStatus CSmplYUVReader::LoadNextFrame(mfxFrameSurface1* pSurface)
{
    // check if reader is initialized
    MSDK_CHECK_ERROR(m_bInited, false, MFX_ERR_NOT_INITIALIZED);
    MSDK_CHECK_POINTER(pSurface, MFX_ERR_NULL_PTR);

    mfxU32 nBytesRead;
    mfxU16 w, h, i, pitch;
    mfxU8 *ptr, *ptr2;
    mfxFrameInfo* pInfo = &pSurface->Info;
    mfxFrameData* pData = &pSurface->Data;

    mfxU32 vid = pSurface->Info.FrameId.ViewId;

    fseek(m_fSource, 0, 0); // keep it simple.  repeat first frame!!

    ...

This will source every time LoadNextFrame() is called the same frame (first frame from the file) and encoding will never stop, because the end-of-file is never reached.

When you now compile and start sample_encode.exe with following parameters than sample_encode will run for ever (width/height/InputFile depends on your InputFile):

sample_encode h264 -u speed -hw -i InputFile.YUV -o dummy.h264 -w 1280 -h 960

Now you just have to look in the Windows task monitor
sample_ecode will start with a memory usage of ~175MB and memory usage increases with ~0.2MB/s . After 1 hour it uses >1GB, and so on...

To produce a simple output you may change in sample_utils.ccp the procedure mfxStatus CSmplBitstreamWriter::WriteNextFrame(mfxBitstream *pMfxBitstream, bool isPrint) too. I replaced line 417  the

msdk_printf(MSDK_STRING("Frame number: %u\r"), m_nProcessedFramesNum);
with
ShowMemUsage(m_nProcessedFramesNum);

code looks like this now
    ...    
// print encoding progress to console every certain number of frames (not to affect performance too much)
    if (isPrint && (/*1 == m_nProcessedFramesNum  ||*/ (0 == (m_nProcessedFramesNum % 100))))
    {
        ShowMemUsage(m_nProcessedFramesNum);  //msdk_printf(MSDK_STRING("Frame number: %u\r"), m_nProcessedFramesNum);
    }
    ...

and insert the void ShowMemUsage(int numFrames) function. I put the source at the end of the text and try to attach the modfied sample_utils.cpp to this request.

With this modification you will get an output like this:

Intel(R) Media SDK Encoding Sample Version 5.0.337.79303

Input file format       YUV420
Output video            AVC
Source picture:
        Resolution      1280x960
        Crop X,Y,W,H    0,0,1280,960
Destination picture:
        Resolution      1280x960
        Crop X,Y,W,H    0,0,1280,960
Frame rate      30.00
Bit rate(Kbps)  2247
Target usage    speed
Memory type     system
Media SDK impl          hw
Media SDK version       1.7

Processing started
Frame | Runnig  Used    Waiste  Waiste  Waiste
######|hh:mm:ss [MB]    [MB]    MB/h    B/Frame fps
  5100|00:00:17 171     6       1235    1325    291

Here you see the number of frames, time running, used memory by sample_encode.exe, the waiste memory in mega bytes since startup, the waiste memory in mega bytes per hour, the waiste memory in bytes per frame, and the current fps rate. In my case its 1.235 GB/h!!!

We evaluated the problem with "C++ memory leak detektor" software (http://www.softwareverify.com/cpp-memory.php)
and it figured out that there are many memory leaks in igfxcmrt64.dll (just some examples).

igfxcmrt64.dll CmrtCodemarkerForGTPin_EnqueueWithGroup
igfxcmrt64.dll CmQueue_RT::EnqueueCopyCPUToGPUFullStride(class CmSurface2D* __ptr64, ....)
libmfxhw64.dll MFXVideoCore_GetHandle

I append some sreenshots from the memory validator, here you can see functions that produced memory leaks.

I really hope that you can reproduce the problem and offer us a solution.

Best regards

Carsten

Appended code:

//---------------------------------------------------------------------------------------------------------------
#include <psapi.h>  // for PprocessMemoryInfo
void ShowMemUsage(int numFrames)
{
    static size_t memStartup = 0;
    static DWORD  tStartup = GetTickCount();

    HANDLE hProcess = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, FALSE, GetCurrentProcessId());
    PROCESS_MEMORY_COUNTERS pmc;

    if (NULL == hProcess) return;

    if (GetProcessMemoryInfo(hProcess, &pmc, sizeof(pmc)))
    {
        size_t memUsed   = pmc.PeakWorkingSetSize;
        size_t memUsedMB = memUsed / 1000 / 1000;

        if (memStartup == 0)
        {
            printf("Frame | Runnig \tUsed\tWaiste\tWaiste\tWaiste\t\r\n");
            printf("######|hh:mm:ss\t[MB]\t[MB]\tMB/h\tB/Frame\tfps\r\n");
        }

        if (memStartup == 0) memStartup = memUsed;
        size_t memGrownMB = (memUsed - memStartup) / 1000 / 1000;

        // calculate waiste memory per hour
        const int msPerHour = 60 * 60 * 1000;
        const int msPerMin  =      60 * 1000;

        DWORD msRunning = GetTickCount() - tStartup; // time elapsed in ms
        if (msRunning == 0) msRunning = 1;           // avoid div by zero @ startup
        
        int memWaistePerHour = (int) ((memGrownMB * msPerHour) / msRunning);
        
        int hour = msRunning / msPerHour;
        int min  = (msRunning % msPerHour) / msPerMin;
        int sec  = (msRunning % msPerMin)  / 1000;

        int memWaistePerFrame = (int)((memUsed - memStartup) / numFrames);

        
        printf("%6d|%02d:%02d:%02d\t%I64d\t%I64d\t%d\t%d\t%d\t \r", numFrames, hour, min, sec, memUsedMB, memGrownMB, memWaistePerHour, memWaistePerFrame, numFrames*1000/msRunning);

    }

    CloseHandle(hProcess);
}

 

 

 

 

Attachments: 

AttachmentSize
Downloadimage/png memory leak analyser.png56.8 KB

Wow, 

Excellent investigation, Carsten K! 

I'm seeing the same behavior of memory leak using hardware, but not with software. I've tried several versions of the SDK (2014, 2013 R2, 2012 R3) and they all exhibit the same behavior (only tested on my machine so far, using an i7 4750HQ) . My usage is a bit different from most others that I've seen in the forums, though.

We're encoding captured video frames in real time to be sent out in real time as h.264 (we do the audio encoding and muxing ourselves). When passing the encoder 1280x720 frames with 32bit ARGB pixels at 30 frames per second, I see an average of one MB/minute memory leakage over the course of several hours using the HW implementation. When using SW implementation, memory is extremely steady for as long as the application runs (we do overnight tests several days a week). 

If it would help, I can condense the code to the initialization and encode functionality and post it here.

 

-Andrew

Additional info:

H264 hardware decoding has the same memory leak when using system memory!

Direct3D surfaces and software decoding seemed to be ok.

Best regards

Carsten

 

Carsten, thanks for all of the great details!  They are very helpful.  I've passed a reproducer for the decode memory leak to the dev team as well.  

 

Hi all,

Is there any update on this problem?  We are running tests decoding h.264 HD material and transcoding to an h.264 proxy.

Running this on a 32 core machine we run out of memory after processing about 100 clips.  Our experiments indicate that the number of cores effects the rate at which the memory is lost.

Thanks.

Hi all,

just to bring it up again.

Is there any solution available so far?

Is this bug fixed?

I need a bug-free version as soon as possible, please inform me, when any new version of MediaSDK or graphics driver is availalbe (as shown in earlier post, i think the bug is in the graphiccs driver igfxcmrt64.dll.).

Even if only beta versions are avaible... i need a solution!!!

At the moment we can NOT RELASE our software. The whole production is freezed because of this bug.

Can i contact the dev team directly?

Best regards

Carsten

 

 

 

Sorry for the delay.  The 15.33.0.3496 driver, available now as beta from downloadcenter.intel.com, has a fix for this leak.

Decode and encode with system memory now have stable memory usage.  For example, on my test machine with the previous driver memory use increased about 50 MB per 100k frames of HD encode but now I am not seeing any increase over long runs.  

The schedule for the next non-beta release is still TBD, but the importance is understood and we're working to get this out as soon as possible.  

Regards, Jeff

 

This change fixes memory issues in my tests.

The driver version is either 15.​33.​18.​3496 (32 bit) or 15.​33.​18.​64.​3496 (64 bit). Maybe that's what 15.33.0.3496 means but it wasn't clear to me.

Hi Joe,

The 'Beta' driver was repackaged / renamed to 15.33.18.3496 now that it is no longer considered "Beta", and is considered an official / recommended release.  It is same driver, other than the name and intended purpose.

 

-Tony

Leave a Comment

Please sign in to add a comment. Not a member? Join today