When confronted with nested loops, the granularity of the computations that are assigned to threads will directly affect performance. Loop transformations such as splitting and merging nested loops can make parallelization easier and more productive.
避免线程之间发生堆冲突 (PDF 256KB)
检测线程应用中的内存带宽饱和度 (PDF 231KB)
One key to attaining good parallel performance is choosing the right granularity for the application. Granularity is the amount of real work in the parallel task. If granularity is too fine, then performance can suffer from communication overhead.
介绍第二代英特尔® 至强® 可扩展处理器产品家族的新特性、增强功能以及为开发人员带来的优势