Intel® Performance Bottleneck Analyzer (Archived)

Machine clears due to self modifying code

TITLE:  MACHINE_CLEARS.SMC

ISSUE_NAME:   MACHINE_CLEARS.SMC

DESCRIPTION: 

This event fires when self-modifying code is detected. This can be typically used by folks who do binary editing to force it to take certain path (e.g. hackers). This event counts the number of times that a program writes to a code section. Self-modifying code causes a severe penalty in all Intel 64 and IA-32 processors. The modified cache line is written back to the L2 and LLC caches. Also, the instructions would need to be re-loaded hence causing performance penalty.

256 bit split load/store issues

TITLE:  Split load/store 256 bit finder

ISSUE_NAME:   SPLIT_LOAD_STORE_256_BIT (sub issue LOAD or STORE)

DESCRIPTION: 

As shown in the example below, if the code is doing 128-bit load and then insert of 128-bit to higher 256-bits of the same registers, then we are adding another instruction instead of utilizing full 256-bit loads. It is recommended the code generators avoid this behavior

 

EXAMPLE: 

LOAD:

vmovupd xmm3, xmmword ptr [rax+r8*1]

vbroadcastsd ymm5, qword ptr [rsi+r13*8]

Inline Function Opportunity

TITLE: Function Inline Opportunity

ISSUE_NAME: INLINE_FUNCTION_OPPORTUNITY

DESCRIPTION: PBA looks for opportunities to inline functions since call and ret are fairly expensive instructions for shoft functions.  It does this by simply following the hot path through the function with streams and counting the number of instructions that are executed between the call and the return instructions.  It will ignore zero-length or zero-displacement calls which do not have a corresponding return instructions.

Back End 3 Or More Uops Executed

TITLE: Back End 3 Or More Uops Executed

ISSUE_NAME: Backend^CoreBound^Cycles3mPortsUtilized

DESCRIPTION:

Cycles where the core executed a total of 3 or more uops

RELEVANCE:

This metric represents how often the core executed a total of 3 or more uops in a cycle.  When 3 or more uops are executed in a cycle, this likely indicates good execution since either the maximum or close to the maximum bandwidth of uops in execution was achieved.

EXAMPLE:

SOLUTION:

RELATED_SOURCES:

NOTES:

订阅 Intel® Performance Bottleneck Analyzer (Archived)