IPP and TBB

IPP and TBB

Hi,

I am wondering if IPP(7.1) in general and ippiWarpAffine* in special, does take advantage of TBB's parallel_for and if yes what is the way to enable it. When I did enabled TBB on OpenCV I got a significant speed boost on the warpAffine().

My test images(CT medical image) are 512x512 (8u) and I am using CUBIC interpolation on a destination sizes of 1590x820, . OpenCV(with TBB) is more that 3 times faster than IPP for exactly the same AffineTransform. Is worth mentioning that I am using in both (IPP and OpenCV) cases java wrappers under linux(RH6) 64-bit. For IPP I did compile the java language support (from IPP 7.0.7) against 7.1 and I am using jipp.ip.ippiWarpAffine_8u_C1R(). From OpenCV I am using Imgproc.warpAffine().

Any ideas? Please note that I am new to IPP and TBB and I am evaluating different products in order to find a good basis for a rendering libray (64-bit - Win7, Linux, Mac). From Intel I did download Intel C++ Composer XE 2013 which bundles IPP and TBB along with IMK and intel's compiler and it   seems a nice fit for us so far.

Thank You,

Dacian

publicaciones de 4 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

Hi Dacian,

ippiWarpAffine is not internally threaded (check the Documentation\en_US\ipp\ThreadedFunctionsList.txt for

threaded function list), so it can not benefit from the internal threadings. If you want to get threading

performance, you needs to implement the high level threading by yourself with tbb, or other ways.

Regards,
Chao

Thanks Chao,

I did notice the ThreadedFunctionList.txt and I decomposed my affine transform into mirror, rotate, resize. Overall I got prety good results,  however I am wondering if you can be a little bit more specific about how I can proceed in using TBB's parallel_for with ippiWarpAffine_8u_C1R(). Are you suggesting to decompose the source image in smaller parts (with some overlap perhaps)?

Dacian

 

>>...how I can proceed in using TBB's parallel_for with ippiWarpAffine_8u_C1R().

For '...TBB's parallel_for...' you should look at TBB samples, for example a set of classes for partitioning.

>>...Are you suggesting to decompose the source image in smaller parts (with some overlap perhaps)?

In overall Yes but you will need to verify that a final result after processing of several parts of the image in parallel will be identical ( rounding errors are possible ) to a regular processing with one image ( without partitioning ).

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya