TITLE: LCP STALL
ISSUE_NAME: ILD_LCP_STALL, SINGLE_FIRE
DESCRIPTION: LCP STALL DESCRIPTION: The length of an instruction can be up to 15 bytes in length. Some prefixes can dynamically change the length of an instruction that the decoder must recognize (LCP). Typically, the pre-decode unit will estimate the length of an instruction in the byte stream assuming the absence of a LCP. When the predecoder encounters an LCP in the fetch line (mostly due to the use of 16-bit immediates, ie. a “short”), it must use a slower length decoding algorithm. With the slower length decoding algorithm, the predecoder decodes the fetch in 6 cycles, instead of the usual 1 cycle. Normal queuing throughout of the machine pipeline generally cannot hide LCP penalties.
RELEVANCE: Nehalem, Sandybridge, Ivybridge, Haswell.
Code causing a single-fire LCP Stall(each assembly line causes one instance):
ADD DX, 01234H
ADD word ptr [EDX], 01234H
ADD word ptr 012345678H[EDX], 01234H
ADD word ptr [012345678H], 01234H
SOLUTION: Avoid using imm16 values. Favor using imm8 or imm32 values. If imm16 is needed, load the imm16 value into a 32-bit (double word) variable, pad the high bits, and access the low word accordingly. Most modern compilers will avoid the generation of LCPs via such padding operations of imm16 to imm32 values. In lieu of being handled automatically by the compiler, changing the immediate values manually in the source code will be required.
RELATED_SOURCES: Intel® 64 and IA-32 Architectures Optimization Reference Manual
NOTES: This issue is a code-generation problem. However, in some circumstances (old compiler) it can be corrected manually by switching a 16-bit immediate to a 32-bit immediate. Attempting to fix LCP stalls by hand is probably not a good use of your time unless they are in extremely hot hand-coded assembly.
TITLE: LCP STALL