Floating point assists also known as FP_ASSISTS. These are caused by primarily two reasons: denormals and underflow numbers which are performance concern on most of the architectures.
This event fires when self-modifying code is detected. This can be typically used by folks who do binary editing to force it to take certain path (e.g. hackers). This event counts the number of times that a program writes to a code section. Self-modifying code causes a severe penalty in all Intel 64 and IA-32 processors. The modified cache line is written back to the L2 and LLC caches. Also, the instructions would need to be re-loaded hence causing performance penalty.
This event fires when non flat (loads when non zero data segment - ds) are dispatched. This causes penalty in address generation unit when such loads are dispatched.
TITLE: Split load/store 256 bit finder
ISSUE_NAME: SPLIT_LOAD_STORE_256_BIT (sub issue LOAD or STORE)
As shown in the example below, if the code is doing 128-bit load and then insert of 128-bit to higher 256-bits of the same registers, then we are adding another instruction instead of utilizing full 256-bit loads. It is recommended the code generators avoid this behavior
vmovupd xmm3, xmmword ptr [rax+r8*1]
vbroadcastsd ymm5, qword ptr [rsi+r13*8]
TITLE: Function Inline Opportunity
DESCRIPTION: PBA looks for opportunities to inline functions since call and ret are fairly expensive instructions for shoft functions. It does this by simply following the hot path through the function with streams and counting the number of instructions that are executed between the call and the return instructions. It will ignore zero-length or zero-displacement calls which do not have a corresponding return instructions.
TITLE: Store Forward Block
TITLE: AVX-SSE TRANSITION PENALTY
TITLE: LCP STALL
ISSUE_NAME: ILD_LCP_STALL, SINGLE_FIRE
TITLE: AGEN STALL
TITLE: Back End 3 Or More Uops Executed
Cycles where the core executed a total of 3 or more uops
This metric represents how often the core executed a total of 3 or more uops in a cycle. When 3 or more uops are executed in a cycle, this likely indicates good execution since either the maximum or close to the maximum bandwidth of uops in execution was achieved.
- Page 1