I'm trying to use MKL 1D FFT library, e.g., I call 1M batch of size 1K FFT using MKL single precision.
If I just run the library call the performance was very steady and very fast, say, 0.3 seconds on my machine.
However, if I include the library call in my application, which is multi-threaded, the performance of the library call would vary from 0.3-0.6 seconds with 0.5 seconds occuring most often.
I was wondering if anyone else had experienced this or I was making mistakes and maybe there is a way to achieve good steady performance?
Thanks in advance!