TITLE: Switches Between Decoded Instruction Cache and the Legacy Front End Pipeline
The Decoded ICache has many advantages over the legacy decode pipeline. It eliminates
many bottlenecks of the legacy decode pipeline such as instructions decoded
into more than one micro-op and length changing prefix (LCP) stalls.
A switch to the legacy decode pipeline from the Decoded ICache only occurs when a
lookup in the Decoded ICache fails and usually costs anywhere from zero to three
cycles in the front end of the pipeline.
This performance issue only impacts architectures code-named Sandy Bridge and Ivy Bridge.
The Decoded ICache events all have large skids and the exact instruction where they
are tagged is usually not the source of the problem so only look for this issue at the
process, module and function granularities.
Determining cost of switches from the Decoded ICache to the legacy decode pipeline.
% DECODED_ICACHE_SWITCH_PENALTY =
100 * DSB2MITE_SWITCHES.PENALTY_CYCLES / CPU_CLK_UNHALTED.THREAD;
Determining the average cost per Decoded ICache switch to the legacy front end:
DSB2MITE_SWITCHES.PENALTY_CYCLES / DSB2MITE_SWITCHES.COUNT;
There are no partial hits in the Decoded ICache. If any micro-op that is part of that
lookup on the 32-byte chunk is missing, a Decoded ICache miss occurs on all microops
for that transaction.
There are three primary reasons for missing micro-ops in the Decoded ICache:
1) Portions of a 32-byte chunk of code were not able to fit within three ways of the Decoded ICache.
2) A frequently run portion of your code section is too large for the Decoded ICache. This case is more common on server applications since client applications tend to have a smaller set of code which is "hot".
3) The Decoded ICache is getting flushed for example when an ITLB entry is evicted.