I'm currently evaluating MKL11 to decide, if it should replace the older MKL6, that is used till now. I wrote a little console application to compare the FFT performance (for the moment just the computation time, not the numerical exactness), but the results rather suprised me, the MKL11 seems to be slower than MKL6.
The program runs 1100 FFTs with different lengths and measures the time. The attached plots show avg/min/max plots of 1100 loops (green). The red curves excluded the first 100 loops from the logging - no big difference there. The time is for each FFT calculation.
Plotting both average curves shows, that the MKL6 needs approximately half the time.
I was a bit surprised by these results - does anyone have experience on the FFT performance? Another thing that keeps me wondering are the outliners in the MKL11, that don't occur that much with MKL6.
My testing code:
int FFT_Kernel_float(unsigned int Nfft, void* pIn, void* pOut)
DFTI_DESCRIPTOR_HANDLE hand = 0;
status = DftiCreateDescriptor(&hand, DFTI_SINGLE, DFTI_REAL, 1, Nfft);
status = DftiSetValue(hand, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
status = DftiCommitDescriptor(hand);
status = DftiComputeForward(hand, pIn, pOut);
for (exp=exponent_start;exp<=exponent_stop;exp++) //2^4 to 2^20
Nfft = (unsigned int) pow(2.0,exp);
cxfTimeaxis[i] = ((float) i + 1.0) / fs;
cxfTimesig[i] = ((float)rnd.Get()/UINT_MAX)*2-1; //random signal
Time_all_min = 1e6;
Time_all_max = 0;
Time_firstexcl_min = 1e6;
Time_firstexcl_max = 0;
hpfcAllLoops.Start(); //start time for all loops
for (i=0;i<loops;i++) //loops = 1100
hpfcFirstExcluded.Start(); //start timer for loops after first excluded loops
hpfcIndividual.Start(); //start timer for single execution
status = FFT_Kernel_float(Nfft,cxfTimesig.ptr(), cxfFreqsig.ptr());
Time_individual = hpfcIndividual.Time();
if (i>=exclude_first_from_avg-1) //exclude_first_from_avg = 100
Time_firstexcl_max = max(Time_firstexcl_max,Time_individual);
Time_firstexcl_min = min(Time_firstexcl_min,Time_individual);
Time_all_max = max(Time_all_max,Time_individual);
Time_all_min = min(Time_all_min,Time_individual);
Time_all_tot = hpfcAllLoops.Time();
Time_firstexcl_tot = hpfcFirstExcluded.Time();
Time_firstexcl_avg = Time_firstexcl_tot / (double) (loops - exclude_first_from_avg);
Time_all_avg = Time_all_tot / (double) loops;
//log data here
Any opinions or experiences on this issue?Am I comparing apples and oranges?
(Win7, Intel i5-2500, C++, Visual Studio 2008)