Hi All,

I'm currently evaluating MKL11 to decide, if it should replace the older MKL6, that is used till now. I wrote a little console application to compare the FFT performance (for the moment just the computation time, not the numerical exactness), but the results rather suprised me, the MKL11 seems to be slower than MKL6.

The program runs 1100 FFTs with different lengths and measures the time. The attached plots show avg/min/max plots of 1100 loops (green). The red curves excluded the first 100 loops from the logging - no big difference there. The time is for each FFT calculation.

Plotting both average curves shows, that the MKL6 needs approximately half the time.

I was a bit surprised by these results - does anyone have experience on the FFT performance? Another thing that keeps me wondering are the outliners in the MKL11, that don't occur that much with MKL6.

My testing code:

**int FFT_Kernel_float(unsigned int Nfft, void* pIn, void* pOut)****{**** int status;**** DFTI_DESCRIPTOR_HANDLE hand = 0;**** status = DftiCreateDescriptor(&hand, DFTI_SINGLE, DFTI_REAL, 1, Nfft);**** status = DftiSetValue(hand, DFTI_PLACEMENT, DFTI_NOT_INPLACE);**** status = DftiCommitDescriptor(hand);**** status = DftiComputeForward(hand, pIn, pOut);**** DftiFreeDescriptor(&hand);**** return status;****}**

**for (exp=exponent_start;exp<=exponent_stop;exp++) //2^4 to 2^20 **** {**** Nfft = (unsigned int) pow(2.0,exp);**** cxfTimesig.alloc(Nfft);**** cxfTimeaxis.alloc(Nfft);**** cxfFreqsig.alloc(Nfft);**

** for (i=0;i<Nfft;i++)**** {**** cxfTimeaxis[i] = ((float) i + 1.0) / fs;**** cxfTimesig[i] = ((float)rnd.Get()/UINT_MAX)*2-1; //random signal**** }**

** Time_all_min = 1e6;**** Time_all_max = 0;**** Time_firstexcl_min = 1e6;**** Time_firstexcl_max = 0;**** **** hpfcAllLoops.Start(); //start time for all loops**** for (i=0;i<loops;i++) //loops = 1100**** {**** if (i==exclude_first_from_avg-1)**** hpfcFirstExcluded.Start(); //start timer for loops after first excluded loops**** hpfcIndividual.Start(); //start timer for single execution**** status = FFT_Kernel_float(Nfft,cxfTimesig.ptr(), cxfFreqsig.ptr());**** ****Time_individual = hpfcIndividual.Time();**** if (i>=exclude_first_from_avg-1) //exclude_first_from_avg = 100**** {**** Time_firstexcl_max = max(Time_firstexcl_max,Time_individual);**** Time_firstexcl_min = min(Time_firstexcl_min,Time_individual);**** }**** Time_all_max = max(Time_all_max,Time_individual);**** Time_all_min = min(Time_all_min,Time_individual);**** }**

** Time_all_tot = hpfcAllLoops.Time();**** Time_firstexcl_tot = hpfcFirstExcluded.Time();**** **** Time_firstexcl_avg = Time_firstexcl_tot / (double) (loops - exclude_first_from_avg);**** Time_all_avg = Time_all_tot / (double) loops;**

** //log data here**** }**

Any opinions or experiences on this issue?Am I comparing apples and oranges?

Marian

(Win7, Intel i5-2500, C++, Visual Studio 2008)