Errors when using several encoding applications

Errors when using several encoding applications

Hi,

Customers often wish to fully utilize processor resources for decoding/encoding (of live/realtime video streams).
I.e. to run multiple imsdk applications: some of them utilize gpu/hw, and some - cpu/sw.

And it is normally that number of applications can be changed from time to time.
Or, alternative approach: one (multithreaded) apllication/service, which can change number of handled streams on-the-fly.
E.g. computer transcodes 5 iptv streams today and will transcode 7 streams tomorrow. And, 5 running already streams shouldn't be interrupted during start of 2 additional streams.

But intel media sdk library has well-known bug, which has not been fixed for years.
Start/stop of mfx session may cause errors inside another running sessions.
I think the reason is lack of synchronization primitives somewhere inside imsdk libraries.

I saw such errors on different processors, different windows versions, etc.
Issue can be easily reproduced using standart imsdk samples.

Steps to reproduce a bug with sw-library on windows:

1. Download latest samples (today it is https://software.intel.com/sites/default/files/managed/61/d0/MediaSample...).
2. Take \_bin\win32\sample_encode.exe from it.
3. Employ latest imsdk library (MediaSDK2019R1.exe, libmfxsw32.dll).
4. Take some uncompessed video file. I use this one _input.nv12: https://drive.google.com/file/d/1z3O6iobsnPLzwQddXlTHoK1UzOIY9fJ3/view?u...
5. Download two scripts (sample_encode_1.bat and sample_encode_N.bat) attached to this message and put them beside sample_encode.exe.
6. Start sample_encode_N.bat. It will run in infinite loop four sample_encode.exe instances.
7. Wait several days (or less), and you'll see error messages at sample_encode consoles.

This bug has already been published years ago. But it is still not fixed. Here you can find detailed discussions:
https://software.intel.com/en-us/forums/intel-media-sdk/topic/696953
https://software.intel.com/en-us/forums/intel-media-sdk/topic/475624
https://software.intel.com/en-us/forums/intel-media-sdk/topic/536840
 

AttachmentSize
Downloadapplication/zip 4_encoders_loop.zip691 bytes
13 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Thanks,

I am doing it now and I hope I can reproduce it.

The sample and release are old, I am using the latest sample and release--Media SDK for Windows 2019R1. So I skipped step 1~3 from your list.

I will keep you updated.

Mark

This is the updates,

I run the script for several hours and it was interrupt. It has error message of memory allocation but I am not sure if it is because of the issue you report or the windows sleep mode.

I check all the posts from you and I found the error is not the same as those. I also noticed you were referring to library " libmfxsw32.dll", I briefly check my installation, I can't find it. As I remembered, we discontinued software codec support.

Any way, here is the error message I got, do you still want me to continue?

file 151 processed, go next
file 152 processed, go next

[ERROR], sts=MFX_ERR_MEMORY_ALLOC(-4), CEncodingPipeline::Run, MSDK_INVALID_SURF_IDX==nEncSurfIdx error at c:\bb\nnmsdkbaw05_1\build_windows_sw_lib\build_dir\repos\mdp_msdk-lib\samples\sample_encode\src\pipeline_encode.cpp:2053

[ERROR], sts=MFX_ERR_MEMORY_ALLOC(-4), wmain, pPipeline->Run failed at c:\bb\nnmsdkbaw05_1\build_windows_sw_lib\build_dir\repos\mdp_msdk-lib\samples\sample_encode\src\sample_encode.cpp:1522
error got from encode: -4
Press any key to continue . . .

Mark

Quote:

Liu, Mark (Intel) wrote:

I check all the posts from you and I found the error is not the same as those.

Any way, here is the error message I got, do you still want me to continue?

Errors are different from time to time. This is one of the reasons why I supposed that problems root is a lack of MT-synchronization primitives inside libmfx*.dll. Run the test again and again, and you'll see other errors that occur in different places.

Several years ago I wrote a workaround for our applications. It is a system-wide synchronization essence having some intelligence. It prevents parallel access to libmfx during "management" calls (MFXVideoENCODE_Init, MFXVideoENCODE_Close, etc), but allows multiprocess/multithreaded usage of coding pipe-line routines (MFXVideoENCODE_EncodeFrameAsync, MFXVideoCORE_SyncOperation, etc). Since that time our applications could work reliably in 24/7/365/N mode on hundreds of computers.

Having such a patch, why am I raising the question again now? Because I have suspicion about false MFX_ERR_GPU_HANG triggering. And I feels their source is also relative to synchronization lack. But for now, this is only suspicion. I have to perform a piece of research to confirm or deny it. I'll create a new forum topic if suspicions are confirmed. In the meantime, it would be desirable that intel developers to pay attention to synchronization problems.

In addition, I would like other people on planet Earth to have the possibility to normally use imsdk in multi-application scenarios:)

 

Quote:

Liu, Mark (Intel) wrote:

I also noticed you were referring to library " libmfxsw32.dll", I briefly check my installation, I can't find it. As I remembered, we discontinued software codec support.

Umm. Can you tell more about the discontinuation of software version?

Because imsdk 2019 r1 release notes tell a different story:
- System Requirements: IA-32 or Intel 64 architecture ... for running software implementation...
- Known Limitations: ... is relevant for both software and hardware implementations ...

And MediaSDK2019R1.exe contains both libmfxsw32.dll and libmfxsw64.dll.

I realize that new coding features are absent in software versions, but the full absence of implementation/support is a something new for me. Or did I misunderstand you?

 

Yes, you are right and this is my fault. We have libmfxsw32.dll which is under <Media SDK root>/Software Development Kit/bin/win32 directory, it is the software codec.

I am still trying the reproducer you provided, I started with 4 threads over night and I can see they fail one by one at different time; by the time I left, I saw only one running, all the failures had sync error. Are these you expected?

 I will update the details when all threads are done.

 

Quote:

Liu, Mark (Intel) wrote:
 I will update the details when all threads are done.

The last application instance will not fail (most likely). Because there are no more competitions/races on that machine.

Quote:

Liu, Mark (Intel) wrote:
all the failures had sync error. Are these you expected?

Do you mean such messages?:

[ERROR], sts=MFX_ERR_UNKNOWN(-1), CEncodingPipeline::GetFreeTask, m_TaskPool.SynchronizeFirstTask failed at src\pipeline_encode.cpp:1533
[ERROR], sts=MFX_ERR_UNKNOWN(-1), CEncodingPipeline::Run, m_pmfxENC->EncodeFrameAsync failed at src\pipeline_encode.cpp:1738
[ERROR], sts=MFX_ERR_UNKNOWN(-1), wmain, pPipeline->Run failed at src\sample_encode.cpp:1086

Yes, it is typical failures.

Hi,

I had a 3-day runs and following results with script sample_encode_N.bat:

  • It starts 4 threads and 3 of them crashed in first day(<12 hours) with following error: 

    file 86 processed, go next
    
    [ERROR], sts=MFX_ERR_UNKNOWN(-1), CEncTaskPool::SynchronizeFirstTask, SyncOperation failed at src\pipeline_encode.cpp:157
    
    [ERROR], sts=MFX_ERR_UNKNOWN(-1), CEncodingPipeline::GetFreeTask, m_TaskPool.SynchronizeFirstTask failed at src\pipeline_encode.cpp:1748
    
    [ERROR], sts=MFX_ERR_UNKNOWN(-1), CEncodingPipeline::Run, m_pmfxENC->EncodeFrameAsync failed at src\pipeline_encode.cpp:1961
    
    [ERROR], sts=MFX_ERR_UNKNOWN(-1), wmain, pPipeline->Run failed at src\sample_encode.cpp:1301
    error got from encode: -1
    Press any key to continue . . .
  • The last thread is still running up to now,

I am attached the screen capture here, is this the similar to yours?

I will submit a bug on this.

Mark

Attachments: 

AttachmentSize
Downloadimage/png SE_crash.PNG140.31 KB

Hi,

Quote:

Liu, Mark (Intel) wrote:
 I am attached the screen capture here, is this the similar to yours? 

Yes, your errors are similar to mine.

They arise when using sw-version of libmfx*.dll. Both 32-bit and 64-bit libraries have that problem.

And perhaps that bug is also the source of MFX_ERR_GPU_HANG errors at hw-libraries. I'll describe how to reproduce hw-errors in the next post.

 

It is harder to reproduce MFX_ERR_GPU_HANG issues using imsdk samples. I tried to find a script that shows errors faster.

Steps to reproduce MFX_ERR_GPU_HANG at encoder application (32-bit or 64-bit):
1. Download more_tools_to_raise_gpu_hang.zip and unpack it to your working folder.
2. Run sample_decode_N_and_encode.bat and wait.
3. If you don't see errors within 10-20 minutes, then close all sample_decode/sample_encode windows and go to step 2.

In a real life I saw MFX_ERR_GPU_HANG occurrences amid ordinary decoding/encoding work (without applications start/stop). It was observed using intel graphics driver version 6194, 6323, 6373, 7212 on i7-6700, i5-7500, i5-7260u, e3-1585-v5. So it seems like a common problem.

And I want to note that real-life MFX_ERR_GPU_HANG occurrences was observed when cpu/gpu load was reasonably away from 100%: on a machines with live (not file) media streams.
 

Attachments: 

Thanks,

I have submitted the first issue,

Could you submit a different post for GPU hang issue? You don't have to resubmit the data and description but just point back to this post.

We need them to be debugged separately because I can't assume they are the same issue.

Mark

Hi Mark,

Quote:

Liu, Mark (Intel) wrote:
 Could you submit a different post for GPU hang issue? 

Done:

https://software.intel.com/en-us/forums/intel-media-sdk/topic/830266

Thanks!

Leave a Comment

Please sign in to add a comment. Not a member? Join today