I am trying to analyze some benchmarks and see how much of their stall cycles are related to memory access. I looked at the documents: "Intel 64 and IA-32 Architectures Optimization Reference Manual" and "Performance Analysis Guide for Intel Core i7 Processor and Intel Xeon 5500 processors".
I could gather that: Total Cycles = UOPS_EXECUTED.CORE_STALLS_CYCLES + UOPS_EXECUTED.CORE_ACTIVE_CYCLES where Total Cycles is CPU_CLK_UNHALTED.THREAD. I also understand that memory related accesses are through ports 2,3 and 4; where as, ALU related operations are through ports 0, 1 and 5.
I could find UOPS_EXECUTED.PORT015_STALL_CYCLES counter to get ALU related stalls but no counter to get memory related stalls. Counter UOPS_EXECUTED.PORT234_CORE seems to be overall memory UOPS and not stall cycles.
Could anyone suggest how to identify memory related stalls?
Also, for the programs I ran, UOPS_EXECUTED.PORT015_STALL_CYCLES was greater than UOPS_EXECUTED.CORE_STALLS_CYCLES. Does that make sense?
I hope this is the right forum for this question. Please correct me otherwise.