IntroductionParallel programs with multiple threads must use synchronization techniques in order to insure correct operation. Generally, synchronization operations use shared synchronization variables and "spin-wait" loops that check on the values of those variables. Starting from the Intel® Pentium® 4 and Xeon® processors, Intel® IA-32 architecture provides a new instruction to address the performance issues associated with spin loops.
This application note (AP-949) addresses two important optimization issues for multi-threading computations involving high-speed processors: spin loop and shared-data management. Specifically, these optimizations include the use of the new PAUSE instruction in spin-wait loops and the placement of shared and non-shared data on different 128-byte cache lines. Intel strongly recommends using the new PAUSE instruction in spin-wait loops as soon as possible since it is backward compatible with all earlier IA-32 architecture. This document describes in detail the recommended changes and the reasons behind these changes.