Split complex (real real) support for 2D/3D FFTs has been added from Intel® MKL 10.3 onwards.
Intel’s new parallel programming model is a new set of Libraries developed by Intel Software and Solutions Group in order to help developers write scalable code without worrying about managing threads.
When confronted with nested loops, the granularity of the computations that are assigned to threads will directly affect performance. Loop transformations such as splitting and merging nested loops can make parallelization easier and more productive.
检测线程应用中的内存带宽饱和度 (PDF 231KB)
避免并发现线程之间的假共享 (PDF 218KB)
One key to attaining good parallel performance is choosing the right granularity for the application. Granularity is the amount of real work in the parallel task. If granularity is too fine, then performance can suffer from communication overhead.
面向英特尔® MIC 架构的编译器方法
选择性地使用 gatherhint/scatterhint 指令