For a large, memory / compute intensive application, we are concerned about possible FSB contention for Tigerton in the quad-core configuration. Although most test cases ran well in quad-core, one case with high cache demand saw an almost 2X increase in run time compared to Tigerton in a dual-core configuration (using Linux taskset to specify either 4 CPUs or 2 CPUs per physical socket).
The slow test case saw CPI double and MEM_LOAD_RETIRED.L2_LINE_MISS % double for the application as a whole. Individual functions had CPI as high as 52 and L2_LINE_MISS as high as 25%.
The question is, "What additional Tigerton Core events can be used to debug this situation?"
In attempting to understand the impact of FSB contention, we tried monitoring L2_REJECT_BUSQ.BOTH_CORES.ANY.MESI %, but this event does not correlate with L2_LINE_MISS.
Also, cpuinfo reports 4096 KB cache. Is this shared between all 4 cores? Is the available cache per core double for the 2 CPU configuration?