AP-949 Using Spin Loops on Intel® Pentium® 4 Processor and Intel® Xeon® Processor

Submit New Article

May 12, 2008 9:00 PM PDT


Parallel programs with multiple threads must use synchronization techniques in order to insure correct operation. Generally, synchronization operations use shared synchronization variables and "spin-wait" loops that check on the values of those variables. Starting from the Intel® Pentium® 4 and Xeon® processors, Intel® IA-32 architecture provides a new instruction to address the performance issues associated with spin loops. This application note addresses two important optimization issues for multi-threading computations involving high-speed processors: spin loop and shared-data management. Specifically, these optimizations include the use of the PAUSE instruction in spin-wait loops and the placement of shared and non-shared data on different 128-byte cache lines. Intel strongly recommends using the PAUSE instruction in spin-wait loops as soon as possible, since it is a backward compatible with all earlier IA-32 architecture. This document describes in detail the recommended changes and the reasons behind these changes.

Click here to read the article [PDF]