Switches Between Decoded Instruction Cache and the Legacy Front End Pipeline

Switches Between Decoded Instruction Cache and the Legacy Front End Pipeline

Michael Chynoweth (Intel)的头像

TITLE: Switches Between Decoded Instruction Cache and the Legacy Front End Pipeline

ISSUE_NAME: DECODED_ICACHE_SWITCH_PENALTY

DESCRIPTION:

The Decoded ICache has many advantages over the legacy decode pipeline. It eliminates

many bottlenecks of the legacy decode pipeline such as instructions decoded

into more than one micro-op and length changing prefix (LCP) stalls.

A switch to the legacy decode pipeline from the Decoded ICache only occurs when a

lookup in the Decoded ICache fails and usually costs anywhere from zero to three

cycles in the front end of the pipeline.
RELEVANCE:
This performance issue only impacts architectures code-named Sandy Bridge and Ivy Bridge.

EXAMPLE:

The Decoded ICache events all have large skids and the exact instruction where they

are tagged is usually not the source of the problem so only look for this issue at the

process, module and function granularities.

Determining cost of switches from the Decoded ICache to the legacy decode pipeline.

% DECODED_ICACHE_SWITCH_PENALTY =

100 * DSB2MITE_SWITCHES.PENALTY_CYCLES / CPU_CLK_UNHALTED.THREAD;

Determining the average cost per Decoded ICache switch to the legacy front end:

AVG.DECODED_ICACHE_SWITCH_PENALTY =

DSB2MITE_SWITCHES.PENALTY_CYCLES / DSB2MITE_SWITCHES.COUNT;

 

SOLUTION:

There are no partial hits in the Decoded ICache. If any micro-op that is part of that

lookup on the 32-byte chunk is missing, a Decoded ICache miss occurs on all microops

for that transaction.

There are three primary reasons for missing micro-ops in the Decoded ICache:

1)   Portions of a 32-byte chunk of code were not able to fit within three ways of the Decoded ICache.

2)   A frequently run portion of your code section is too large for the Decoded ICache. This case is more common on server applications since client applications tend to have a smaller set of code which is "hot".

3)   The Decoded ICache is getting flushed for example when an ITLB entry is evicted.

 

RELATED_SOURCES:
NOTES:

1 条帖子 / 0 new
如需更全面地了解编译器优化,请参阅优化注意事项