英特尔® Fortran 编译器 15.0 现可为包含阵列分配的 OpenMP WORKSHARE 和 PARALLEL WORKSHARE 结构的指定实例生成多线程代码。 很显然，它们是使用 OpenMP SINGLE 结构进行部署，这表示仅可生成单线程代码。
Instructions for how-to use the "Enabling collector for Linux*" to collect Intel® Thread Profiler data on a Linux* system and view the results with the Intel® Thread Profiler for Windows*, part of the Intel® VTune Performance Analyzer for Windows*.
本文将介绍使用面向 TensorFlow 的英特尔® 优化* 进行 CPU 推理的性能注意事项
When confronted with nested loops, the granularity of the computations that are assigned to threads will directly affect performance. Loop transformations such as splitting and merging nested loops can make parallelization easier and more productive.
One key to attaining good parallel performance is choosing the right granularity for the application. Granularity is the amount of real work in the parallel task. If granularity is too fine, then performance can suffer from communication overhead.
Apply the concepts of parallelism and distributed memory computing to your code to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
避免线程之间发生堆冲突 (PDF 256KB)
检测线程应用中的内存带宽饱和度 (PDF 231KB)