I am running the FFT using MKL on intel cpu, which has 36 physical cores and 72 threads, as shown below.
I didn't use the Openmp but threadpool to do FFT using MKL.
The problem is using the threadpool gives a best performance when setting the number of threads as 36 but not 72. And using more number of threads will always give performance improvement when that number is less than 36. But using more number of threads than 36 will not give performance improvement anymore.
I notice that "To achieve higher performance, set the number of threads to the number of processors or physical cores,": https://software.intel.com/en-us/mkl-linux-developer-guide-improving-per.... Though it takes OpenMP, but the thing is the same with threadpool, which is the best performance gotten from setting the number of threads equal to maximum physical cores but not the maximum threads cores.
Why does it like this? Because the data processing complexity of FFT is too high?
So if it is like this, what do the other (36 threads) do? In what situation the 72 threads will fully employed?
Sorry too much questions!
Ant hint will be appreciated!