HW decoding problem...

HW decoding problem...

Hi All,

I made a decoder usingsample source 'sample_decode'

After my decoder program starts, it makes 1-16 threads.
And each thread make an object(class instance)of CDecodingPipeline.

That is, my decoder decodes data from 1-16 IP cameras concurrently.

Multi-decoding of data of 1-8 IP camerahas no problem.
But multi-decoding of over 8 IP camerahas critical problem.

'MFXVideoDECODE_DecodeFrameAsync' function returns 'MFX_ERR_UNKNOWN' and doen't read data in variable 'm_mfxBS'.
Once adecoderthread returns this value,decode function always return above error.

My conclusion is that
1. one process must have less than 9 objects of 'CDecodingPipeline' class.
2. If I want to use over 8 objects, I mustexecutetwo process.
3. thereareone moreresource that entire threads in one process share.

If I use SW decoding in my program, no there is any problem...T.T
Do you have any solution about using HW decoding??
I want to get any infomation about above problems.

andmy platform :

OS: Win 7(32bit)
CPU: i3 (sandy bridge)
Main Board : DH67BL
Mem : 2 GB
Media SDK : 3.0 Beta

Thank you in advance!



16 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi. I had similiar problem.

I made decodefunctionDLLand want to use thatseveral program.

If I used multiple thread decoding( over 8'ch video decoding )then I met the exception in

MFXVideoDECODE_DecodeFrameAsync function. Return value is MFX_ERR_UNKNOWN.

But less than 8'ch decoding that works well.

So I tested 2 process. Each processused 8chdecoding thread.

That makesdecrease decoding speedbut no exception and work well.

Also I tested 3 process. That was fine too.

Ah... SW decoding is fine all case.

Is there any limitation with HW decoding? Please let me know if MSDK has limitation with HW decoding.



We have validated Media SDKusing multiple simultaneous HW and SW decode sessions (50+) internally without encountering the issues you both describe. There are no specific limitaton on number of threads/process.

Unfortunately the error code does not tell us anything about what is going wrong.Could you check make sure that your thread resources (memory, file access etc.) are fully separated so that the threads do not impact eachother. And that the proper resource / memory sharing locks are working properly.

Keep in mind that the number of simultaneous sessions will also be limited by the amount of system memory.

Some questions:Did you both use Media SDK 3.0beta? Do you encounter the same issue using Media SDK2.0? Are you decoding an H.264, MPEG2 or VC1 stream? Are you attempting to join the decode sessions?

See below for a simplified code snippet showcasing multi decode use based on the Media SDK sample_decode project:

[cpp]DWORD WINAPI DecodeThread(
	LPVOID arg)
	mfxStatus sts = MFX_ERR_NONE;
	sInputParams    *pParams = (sInputParams *)arg;
	CDecodingPipeline   Pipeline;

	sts = Pipeline.Init(pParams);

	for (;;)
		sts = Pipeline.RunDecoding();

				sts = Pipeline.ResetDevice();

			sts = Pipeline.ResetDecoder();      

	delete pParams;

	return 0;

int _tmain(int argc, TCHAR *argv[])
	sInputParams	Params;
	mfxStatus		sts = MFX_ERR_NONE;
	const int 		NumThreads = 16;
	sts = ParseInputString(argv, (mfxU8)argc, &Params);
	HANDLE*	pDecodeThreads;
	pDecodeThreads = new HANDLE[NumThreads];
	for(int i=0; istrDstFile, tmp);  
		pDecodeThreads[i] = CreateThread(NULL, 0, DecodeThread, (LPVOID)pTParams, 0, NULL );
	WaitForMultipleObjects(NumThreads, pDecodeThreads, TRUE, INFINITE);

	for(int i=0; i Regards,Petter


I modified 'sample_decode' source codes as below..

And I tested the program with 32 decoder threads using same h.264-encoded video file as input.

But It resulted in many error codes and output files with various sizeseven though I used same input file.

* The resolution of input file : 704 x 396

* The printed return value :

InSW decoding mode, my program was teminated normally.

this problem only occurs in HW decoding mode.

I don't know what the problems is...


Hi kmjlove130,

Thanks for the detailed explanation. This is certainly interesting. My initial thought was that you were facing memory limitaiton or resource contention issues. But from looking at your code clip and the results, that may not be the case.

I tried some permutations with similar setup here and was able to reproduce a similar set of errors. However, I only see this issue when running close to 30 threads or more. Moreover it only happens for 32 bit build and more commonly if system memory is used a surface target (as in your case).

Thanks for reporting this, we will debug this internally and let you know as soon as we have identified the root cause.

In the meantime you may workaround the issue by using 64 bit builds or falling back on SW decode.


Hi Petter,

In addition, I tested using else sample video that has 1080p resolution(1920 x 1080).

In 1-8 channel multi-thread decoding, it operates completely. However, the problem always occurs in more than 9 concurrent decoding.

At the precise moment, I checked memory and then noticed that my available memory( stanby + free ) remains about 450MB that I think it is sufficient. In detail, AllocFrame() function which Init() or ResetDecoder() calls in CPipelineDecode class returns MFX_ERR_MEMORY_ALLOC.

I hopeyouto reply as fastas possible.



I have some questions.

Is the described issue seen with both 32 and 64 bit builds?

For this case it certainly looks like you are getting close to the memory limit. Could you please also try on a system that has more memory?
Assuming you are using HW decode? Are you also using D3D memory surfaces?


Hi, Petter

My answers are below.

1. I have testedusing 32 bit builds only.

2. Above image was captured on 2GB RAM with i3. But I tested on 4GB RAM with i7 CPU and then the result was same. Of course, both CPUs are with sandy bridge that supports MSDK HW decoding.

3. I have tested using bothHW decode and SW decode. But the problemoccured in HW decode only. In SW decode, my program operated completely.

4. I delete all the things that related with D3D in source codes. In other words, I didn't use D3D memory surfaces.


I have the same problem.
Because of this problem is very time consuming.

I want to decode 16ch 1080p but I Failed to decode more than 8ch.
Ifyou want to get my source code and I send that.

I want to get help on how to solve. Please ...

and my platform :

OS: Win 7 (32bit)
CPU: i7 (sandy bridge)
Main Board : DH67BL
RAM : 4 GB
Media SDK : 2.0 Gold


we are working on root causing this issue as explained in earlier post. In the meantime, since the issue cannot be seen in 64 bit build, I suggest using a 64 bit setup so that you can continue development and testing.



I tested some tests in Windows 7 64-bit.

But it results in thesame that my program was not working in over 8-channel multi-decoding.

In detatil,

OS : 64 bit
MainProgram : 32 bit build
Dll File using Intel Mdia SDK : 64 bit build
And the main program use above dll file.

I wonder that Main Program must also 64-bit build.



The issue is not related to OS version (32 or 64 bit) but to your application(hosting Media SDK) executable. If you build 32 bit application using Media SDK 32 bit library (and DLL) you will likely encounter the concurrency issue that has been described inearlier posts. This issue has not yet been resolved in externally available drivers.

I'm not clear on your application architecture, so I cannot comment on main program/DLL question. Considering mix of 32 and 64 bit, your MSDK DLL part and the main program must reside in different processes since they are of different architectures, right?

There are two suggested workaround methods:1) Build your application as a 64 bit binary instead.2) Encapsule the decoder initialization operation in a critical section. This was also proposed here:http://software.intel.com/en-us/forums/showthread.php?t=84815&o=a&s=lr


Have a same problem...

>This issue has not yet been resolved in externally available drivers.
The problem occurs not only in HW mode. We permanently get it in SW mode with more than 16 instances of CDecodingPipeline in different threads. Memory capacity is enough. Tried core-i7 and core i-3 (Sandy Bridge).

Our target machine is x86. we can`t use 64-bit version of sdk.

Will you fix this problem? Or we will be forced to use alternative libs... But Media SDK is better than other...



A fix for this issue is unfortunately not yet available. That said, the suggested workarounds have proven to be successful with many customers. I understand the fact that you cannot move to 64 bit. Please use the second suggested workaround until this issue has been fixed in the Intel Media SDK product.


The second workaround doesnt help.
We tried your code http://software.intel.com/en-us/forums/showpost.php?p=155130
First we trued to add critical sections.
Second we tried to init all objects sequentially befor starting any working threads.
Core-i7 with 6Gb memory works fine in HW mode with 14 threads, and with 22 in SW mode, for 1280x1024 input, but not more. memory usage was not more than 1.5 GB for process and 2GB totaly.


based on the workload you describe it seems your issue may not be related to Media SDK but due to limitation on the amount available graphics memory in 32 bit mode. I believe there is a limit of ~600GB of allocatable graphics memory in 32bit mode.

Can you please confirm if you are using D3D surfaces for both SW and HW case?

If your workload requires more graphics memory, your only option is likely to move to a 64 bit build.


Login to leave a comment.