In Intel MKL 10.2, one-dimensional complex-to-complex fast Fourier transforms (FFTs) are now threaded for non-prime sizes from 216 and larger with the following exception:
- if at least one prime factor is larger than 229, the transform is not supported
An Intel MKL 1D FFT of size K=N*M (where N and M are chosen somewhat arbitrarily) is computed using the Cooley-Tukey factorization algorithm in 2 stages. During the first stage, N transforms of size M are performed in parallel with necessary permutation. For the second stage, multiplication by twiddle factors and M transforms of size N are done in parallel. Care is taken to avoid cache line splits between the threads for both stages. Additionally, the size of the twiddle factor table is reduced by a binary split.
When running on an Intel® Xeon® Processor X5492 (2 x 4-core, 3.4 GHz) running a 64-bit operating system FFT performance improved by up to 5 times for 8 threads, when compared to Intel MKL 10.1 Update 2.
Users just need to upgrade to the Intel MKL 10.2 or later in order to take advantage of this new feature for multi-threaded applications. No code changes are required.