I'm the guy who wanted run-time linking to the DFT routines. I decided I was putting the cart before the horse, static-linked to the MKL dll's and tried timing the MKL dft's against our own code.
To my surprise, MKL is faster only on DFT's of a number of points containing a large prime factor. For other sizes, our own code when optimized is up to 20 to 30 times faster for certain problem sizes. Most problem sizes are anywhere from 2 to 10 times faster.
Is there something I should be looking that might decrease the performance of MKL DFT's?
The MKL DFT routines show little variation in speed with prime factorization of the number of points. That suggests that the algorithm used isn't as optimized as it could be. Modern DFT code is generally much faster when N has only small prime factors.
The file mkl_pr.dll shows version 6.1.0. It is an evaluation copy of MKL. The timings were done on a Sony Vaio with 3 GHz P4 processor.
The various unimplemented routines suggest that MKL DFTs are a work in progress. Does version 7 improve the performance of DFT routines significantly?
Hm- this got a lot longer than I intended. Hope you can make all the way through :)