IPP and TBB

IPP and TBB

imagem de Dacian P.

Hi,

I am wondering if IPP(7.1) in general and ippiWarpAffine* in special, does take advantage of TBB's parallel_for and if yes what is the way to enable it. When I did enabled TBB on OpenCV I got a significant speed boost on the warpAffine().

My test images(CT medical image) are 512x512 (8u) and I am using CUBIC interpolation on a destination sizes of 1590x820, . OpenCV(with TBB) is more that 3 times faster than IPP for exactly the same AffineTransform. Is worth mentioning that I am using in both (IPP and OpenCV) cases java wrappers under linux(RH6) 64-bit. For IPP I did compile the java language support (from IPP 7.0.7) against 7.1 and I am using jipp.ip.ippiWarpAffine_8u_C1R(). From OpenCV I am using Imgproc.warpAffine().

Any ideas? Please note that I am new to IPP and TBB and I am evaluating different products in order to find a good basis for a rendering libray (64-bit - Win7, Linux, Mac). From Intel I did download Intel C++ Composer XE 2013 which bundles IPP and TBB along with IMK and intel's compiler and it   seems a nice fit for us so far.

Thank You,

Dacian

4 posts / 0 new
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.
imagem de Chao Y (Intel)

Hi Dacian,

ippiWarpAffine is not internally threaded (check the Documentation\en_US\ipp\ThreadedFunctionsList.txt for

threaded function list), so it can not benefit from the internal threadings. If you want to get threading

performance, you needs to implement the high level threading by yourself with tbb, or other ways.

Regards,
Chao

imagem de Dacian P.

Thanks Chao,

I did notice the ThreadedFunctionList.txt and I decomposed my affine transform into mirror, rotate, resize. Overall I got prety good results,  however I am wondering if you can be a little bit more specific about how I can proceed in using TBB's parallel_for with ippiWarpAffine_8u_C1R(). Are you suggesting to decompose the source image in smaller parts (with some overlap perhaps)?

Dacian

 

imagem de Sergey Kostrov

>>...how I can proceed in using TBB's parallel_for with ippiWarpAffine_8u_C1R().

For '...TBB's parallel_for...' you should look at TBB samples, for example a set of classes for partitioning.

>>...Are you suggesting to decompose the source image in smaller parts (with some overlap perhaps)?

In overall Yes but you will need to verify that a final result after processing of several parts of the image in parallel will be identical ( rounding errors are possible ) to a regular processing with one image ( without partitioning ).

Faça login para deixar um comentário.