Are the ippiSAD* supported with multi-threads?

Are the ippiSAD* supported with multi-threads?

The SAD (sum of absolute difference) are usually used in video encoder motion estimation, and it occuppiesabout 40% ormore cycles in video encoder. BUT according toThreadedFunctionsList.txt, it's not multi-threads supported!!!This is reallyunbelievable.However, I found many intell IPP articles mentions the multi-threads ipp usage for video encode. i.e., the following articles give an examples to use ippiSAD4x4_8u32sin multi-threads speed up., I try to open multi-threads configure for the ippiSAD* function viaippSetNumThreads. But it doesnot work.Who can exactly tell me if the ippiSAD* function are multi-threads supports? If not, can it be multi-threads with other tools, e.g. OpenMP? and how to? Thank you in advance.

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.


ippiSAD4x4_ is very simple and low level primitive. So threading within the function is not efficient. So a better choice is threading at the high level, not the function internally

As it is showed in this article:

It is threaded at high level for loop
#pragma omp parallel for
for (int i=0; i .....

Also, for the IPP video sample code, it is also threaded at the high level sample, not the the low level functions.



Threading of SADs at the primitive level doesn't make sense - SAD primitives work ~30-150 CPU clocks (depends on flavor and arch) while threading overhead (OMP) is ~2000-3000 clocks - that means that encoder can be efficiently threaded ONLY at the application level. This is true for all other low-complex, few-computational and "short" functions. A list of threaded IPP functions is available in the package - it contains ~2400 APIs (from ~12000) - so don't expect that all functions you use are threaded.


OK. I got it. Thank you.

I thought ipp primitive threading tech are automaticthreaded higher level in those loop and its internal parallel. Now I know it must be threaded at higher level loop manually. And the best and easier way to use OMP for the parallel. But I'm still confuse that there is no threaded videoapplication in the samples.

Quoting fireheart

.......But I'm still confuse that there is no threaded videoapplication in the samples.


Regarding above comments,

Some of video codec samples are thered, you can refer to ipp-samples\audio-video-codecs\doc\UMC reference Manual, every section has explanation on Threading Capabilities. For example DV decoder can create number of threads.


Naveen Gv

Leave a Comment

Please sign in to add a comment. Not a member? Join today