I have made some benchmark on the Core2Duo and I use for that the IPP 5.1.1 which is optimized for Core2Duo. I use the FFT complex-to-complex simple-precision in-place (ippsFFTFwd_CToC_32fc_I).
And I observe strange results :
For size (power of 2) 64 to 1024, the performance are really incredible (8-9 GFlops on a Core2Duo 2GHz), twice the performance of the same benchmark on Core Duo (2GHz) which prooves the improvement made in the SSE by Intel.
But starting at size 2048, the performances are degraded ... two times lower than the Core2Duo, it falls at 2GFlops on the same Core2Duo 2GHz ... While the same benchmark is linear on the CoreDuo, it seems to have a problem on the Core2Duo optimized FFT algorithm.
(The only one differences between the two benchmarks is one is made on Linux (CoreDuo) and the other on MacOSX (Core2Duo) but it the same version of Library and the same frequency used ...)
Can anyone confirm and explain that ?
Other constat, it seems that the use of the buffer in the FFT causes a "segmentation error" since the last IPP Library 5.1.1. Before that version, no problem with the use of the buffer ...