AVX_TRANSITION

AVX_TRANSITION

TITLE: AVX-SSE TRANSITION PENALTY
ISSUE_NAME: AVX_TRANSITION
DESCRIPTION: Intel(r) AVX and Intel(r) SSE instructions can co-exist and execute in the same process space. This can happen if your application includes third party libraries with Intel SSE code, a new DLL using AVX code is deployed that calls other modules running SSE code, or you do not recompile all your application at once. In any of these cases, the AVX code must use the VZEROUPPER instruction to avoid AVX/SSE transition penalty. An AVX instruction always modifies the upper bits of the YMM registers and SSE instructions do not modify them. In hardware, the upper bits of the YMM register collection can be considered to be in one of three states: 1) Clean: All upper bits of YMM are zero. It is the state when processor starts from RESET. 2) Modified and saved to XSAVE region The content of the upper bits of YMM registers matches saved data in XSAVE region. This happens when after XSAVE/XRSTOR executes. 3) Modified and Unsaved: The execution of one AVX instruction (256-bit or 128-bit) modifies the upper bits of the destination YMM. The AVX/SSE transition penalty applies whenever the processor state is “Modified and Unsaved“. Using VZEROUPPER moves the processor state to “Clean“ and avoids the transition penalty.

RELEVANCE: Sandybridge, Ivybridge, Haswell
EXAMPLE:
Code that causes the transition penalty:
//Run some AVX code here
//Run some SSE code here <- PENALTY
Fix for the penalty:
//Run some AVX code here
vzeroupper;
//Run some SSE code here <- NO PENALTY
SOLUTION: Whenever 256-bit AVX code and 128-bit SSE code might execute together, use the VZEROUPPER instruction whenever a transition from “Modified/Unsaved” state is expected. Add VZEROUPPER instruction after 256-bit AVX instructions are executed and before any function call that might execute SSE code. Also, add VZEROUPPER at the end of any function that uses 256-bit AVX instructions.
RELATED_SOURCES: Intel® 64 and IA-32 Architectures Optimization Reference Manual
NOTES: These same rules apply when calling functions which may mix AVX/SSE instructions.

1 条帖子 / 0 全新
如需更全面地了解编译器优化,请参阅优化注意事项