I'm using the IPP function ippiInterpolateLuma_H264_8u_C1R( ) to optimally interpolate my 16x16 macroblocks. Here are my profiler results:
Original Function (Vectorized)- 25ms, called 49638 times
IPP function- 130ms, called 49638 times. (The exact function shown in my profiler ispx_h264_interpolate_luma_type_b_8u_px, which takes 113ms. The function within which the IPP is called takes a further 15ms; that gives me the total of 130ms. I've put the IPP function within another function to enable profiling.)
My function seems to perform 5 times better than the Intel IPP. Isn't there something wrong? I thought IPPs were the most optimal implementations available.
The details of my computer and my settings are given below:
CPU: Intel Pentium 43GHz processorwith a 32bit Windows 7 OS.
Settings- Run from Visual Studio 2010; included IPP libraries ippvc_l.lib, ipps_l.lib, ippcore_l.lib. No compilation or execution errors.
Any help here would be very much appreciated.