下载 (英语 PDF, 75KB)
面向英特尔® MIC 架构的编译器方法
使用英特尔® 线程构建模块（英特尔® TBB）实现并行化
The gaming industry has seen great strides in game complexity recently. Game developers are challenged to create increasingly compelling games. This series explores important Artificial Intelligence (AI) concepts and how to optimize them for multi-core.
When confronted with nested loops, the granularity of the computations that are assigned to threads will directly affect performance. Loop transformations such as splitting and merging nested loops can make parallelization easier and more productive.
One key to attaining good parallel performance is choosing the right granularity for the application. Granularity is the amount of real work in the parallel task. If granularity is too fine, then performance can suffer from communication overhead.
Many applications and algorithms contain serial optimizations that inadvertently introduce data dependencies and inhibit parallelism. One can often remove such dependences through simple transforms, or even avoid them altogether through.
优化数据结构和内存访问模式以改进数据局部性 (PDF 782KB)
Apply the concepts of parallelism and distributed memory computing to your code to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
优化面向 NUMA 的应用 (PDF 225KB)