Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 12/16/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Pause Intrinsic

The prototype for this Intel® Streaming SIMD Extensions 2 (Intel® SSE2) intrinsic is in the xmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>

PAUSE Intrinsic

void _mm_pause(void);

The pause intrinsic is used in spin-wait loops with the processors implementing dynamic execution (especially out-of-order execution). In the spin-wait loop, the pause intrinsic improves the speed at which the code detects the release of the lock and provides especially significant performance gain.

The execution of the next instruction is delayed for an implementation-specific amount of time. The PAUSE instruction does not modify the architectural state. For dynamic scheduling, the PAUSE instruction reduces the penalty of exiting from the spin-loop.

Example of loop with the PAUSE instruction:

In this example, the program spins until memory location A matches the value in register eax. The code sequence that follows shows a test-and-test-and-set.

spin_loop:pause 
cmp eax, A 
jne spin_loop

In this example, the spin occurs only after the attempt to get a lock has failed.

get_lock: mov eax, 1 
xchg eax, A ; Try to get lock 
cmp eax, 0 ; Test if successful 
jne spin_loop

Critical Section

// critical_section code 
mov A, 0 ; Release lock 
jmp continue 
spin_loop: pause; 
// spin-loop hint 
cmp 0, A ; 
// check lock availability 
jne spin_loop 
jmp get_lock 
// continue: other code

NOTE:

The first branch is predicted to fall-through to the critical section in anticipation of successfully gaining access to the lock. It is highly recommended that all spin-wait loops include the PAUSE instruction. Since PAUSE is backwards compatible to all existing IA-32 architecture-based processor generations, a test for processor type (a CPUID test) is not needed. All legacy processors execute PAUSE instruction as a NOP, but in processors that use the PAUSE instruction as a hint there can be significant performance benefit.