IppsPhase_32fc 3 times slower in 64bit

IppsPhase_32fc 3 times slower in 64bit

Hello.

I'm using IPP 8.1, in its 32 & 64bit forms, as DLLs (which I didn't compile myself, but the one who did guarantees that they are both the same version, compiled the same way).

To my experience, IppsPhase_32fc (used on the output of a 1024 bands FFT, thus around 500 pairs), which is already a pretty CPU-expensive function, is nearly three times slower in the 64bit build.

I'm using many IPP functions, and I have not noticed much difference for other functions (64bit versions sometimes very slightly slower), so it's most likely not a problem of branching, at least not a global one. It's of course not a problem of precision either, we're talking about the same _32fc here.

Has anyone experienced the same?
I haven't tested yet if it was the same deal for ATan2.

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hello,

Could can provide the output of the following code at 64 bit code, that may help to understand the code running at the processor.

void libinfo(void) {
       const IppLibraryVersion*
lib = ippsGetLibVersion();
       printf(“%s %s %d.%d.%d.%d\n”,
lib->Name, lib->Version,
           lib->major,
lib->minor, lib->majorBuild, lib->build);
}

 

Thanks,
Chao
 

Hi,

it says 8.1.1.42291 for both

(april 11 2014, & target CPU is "core")

 

Is that function threaded in the background? At which length does threading start to kick in? Could this be a multithreading overhead?

Hi Dambrin,

yes, you are right, x64 version is ~3x slower than ia32 as ia32 has the new optimization that has not been ported to x64 yet. As a workaround you can use the fixed accuracy atan functions (depends on the accuracy you need - the number after "A" suffix shows the number of precise bits -  precision is in the inverse ratio to performance):

IPPAPI( IppStatus, ippsAtan_32f_A11, (const Ipp32f a[],Ipp32f r[],Ipp32s n))
IPPAPI( IppStatus, ippsAtan_32f_A21, (const Ipp32f a[],Ipp32f r[],Ipp32s n))
IPPAPI( IppStatus, ippsAtan_32f_A24, (const Ipp32f a[],Ipp32f r[],Ipp32s n))
IPPAPI( IppStatus, ippsAtan_64f_A26, (const Ipp64f a[],Ipp64f r[],Ipp32s n))
IPPAPI( IppStatus, ippsAtan_64f_A50, (const Ipp64f a[],Ipp64f r[],Ipp32s n))
IPPAPI( IppStatus, ippsAtan_64f_A53, (const Ipp64f a[],Ipp64f r[],Ipp32s n))
 

regards, Igor

 

thanks for the confirmation, I'll use the fixed-accuracy ones

Ok I've switched to IppsAtan2_32f_Axx, and it seems that:

-IppsAtan2_32f_A24 is around 1.5x slower than IppsPhase_32fc (which I would assume to give the same results - but I could be wrong)

-IppsAtan2_32f_A21 is a bit faster than IppsPhase_32fc

-both IppsAtan2_32f_A21 & IppsAtan2_32f_A24 have the same speed in their respective 32 & 64bit versions

(I went for Atan2, Atan seems to give the same CPU usage - the one a division would)

 

I'm ok with that, but I find it strange that internally, IppsPhase_32fc isn't branching to IppsAtan2_32f, and that the generic IppsAtan2_32f isn't branching to IppsAtan2_32f_A24.
That is, we have 2 functions _32f and _32f_A24) that I would expect to be the same thing (unless I misunderstood the specs), and a third function that's the same thing except the input data differs in format, which could have been performed using a little format adaptor + branching to one of the others.

I've met other functions with different names that were doing exactly the same thing, in the past, and there too they differed in CPU usage.

Hi Dambrin,

I think this is not good situation, but guess it is normal situation for so huge library as IPP (~12000 functions). I know several other examples when IPP functions have different names but the same functionality - this is because they have been developed in different domains. If you are OK with Atan_32f_A21/A24 performance - that sounds good - so you have good workaround and we have time to fix issue with ippsPhase performance.

regards, Igor

Leave a Comment

Please sign in to add a comment. Not a member? Join today