Intel® Performance Bottleneck Analyzer (Archived)

Front End Bandwidth

TITLE: Front End Bandwidth

ISSUE_NAME: Frontend^FE_bandwidth

DESCRIPTION:

Cycles some uops were delivered by front end when asked to deliver, but maximum bandwidth was not achieved

RELEVANCE:

When the front end cannot deliver enough uops when the back end is requesting uops, this can potentially indicate a front end limitation which is affecting performance.  The recommendation is to look into the other Frontend metrics to root cause why this occurred.

EXAMPLE:

SOLUTION:

RELATED_SOURCES:

NOTES:

Front End Latency

TITLE: Front End Latency

ISSUE_NAME: Frontend^FE_latency

DESCRIPTION:

Cycles that 0 uops were delivered from the front end when asked to deliver.  

RELEVANCE:

When the back end is requesting uops and the front end cannot deliver them, this can indicate a front end limitation which is affecting performance.  Looking at the uop source and other Frontend metrics such as Frontend^FE_latency^L1IorTLB and Frontend^FE_latency^DSBtoMITE to determine why 0 uops were delivered.

EXAMPLE:

SOLUTION:

RELATED_SOURCES:

NOTES:

Uops from the legacy decode pipeline

TITLE: Uops from the legacy decode pipeline

ISSUE_NAME: Frontend^UopSource_MITE

DESCRIPTION:

This metric describes the percentage of uops delivered to the micro-op queue that came from the MITE, which is the legacy decode pipeline.

RELEVANCE:

If your application is not bound in the front end then whether micro-ops are coming from the legacy decode pipeline or Decoded ICache is of lesser importance.  If you are front end bound and UopSource_MITE is greater than ~30%, you may want to look into the following:

SPIN_WAIT_NO_PAUSE - finder

TITLE: SPIN_WAIT_NO_PAUSE

DESCRIPTION:

 

A spin-wait loop is a technique used in multithreaded applications whereby one thread waits for other threads. The wait can be required for protection of a critical section, for barriers, or for other necessary synchronizations. Typically, the structure of a spin-wait loop consists of a loop that compares a synchronization variable with a predefined value.

 

RELEVANCE:

 

Switches Between Decoded Instruction Cache and the Legacy Front End Pipeline

TITLE: Switches Between Decoded Instruction Cache and the Legacy Front End Pipeline

ISSUE_NAME: DECODED_ICACHE_SWITCH_PENALTY

DESCRIPTION:

The Decoded ICache has many advantages over the legacy decode pipeline. It eliminates

many bottlenecks of the legacy decode pipeline such as instructions decoded

into more than one micro-op and length changing prefix (LCP) stalls.

A switch to the legacy decode pipeline from the Decoded ICache only occurs when a

lookup in the Decoded ICache fails and usually costs anywhere from zero to three

Subscribe to Intel® Performance Bottleneck Analyzer (Archived)